[dfdl-wg] Defuddle Questions and Pull-Parsing Thoughts

Tom Sugden tom at epcc.ed.ac.uk
Fri Jun 2 06:54:37 CDT 2006


Hi all,

Apologies for missing the telcon this week due to other work
pressures. I haven't made much progress with any implementation, but
have been taking a look at the Defuddle code. I have some questions
that someone may be able to answer, and a few thoughts for discussion.

The current Defuddle implementation is based upon JAXME, the Java/XML
binding implementation. Presumably JAXME is used to generate an object
model representation of the data format described by the DFDL schema.
And then, I think the underlying data stream would be marshaled into
an instance of that object model. Is this understanding correct?

If my understanding is correct, I'm concerned that this approach may
not be suitable for large data streams, since the entire object model
instance would probably have to be assembled and stored in memory,
like a DOM tree. Has anybody considered using a streamed pull-parsing
approach instead, based upon or similar to StAX (Streaming API for
XML)?

I was thinking along the lines of parsing the DFDL schema into DOM or
some other internal representation. Then pull-parsing the data stream,
producing a sequence of StAX-like events corresponding to the data in
the stream and its structure. During the pull-parsing, the context
would need to be maintained and the conversion algorithm used for
transforming parts of the data stream into values of the correct type.
These values would then be wrapped in corresponding event objects.

If this approach was viable, then these StAX-like APIs could be used
to implement higher-level applications or APIs. For instance, it would
be straight-forward to produce an XML serialization of any data
described by a DFDL schema. One could also imagine binding any data
described by a DFDL schema to auto-generated Java beans, or to a DOM
object, when desirable. The process may even be reversible, so that
data could be written back to a data stream as well as being read from
one.

I haven't thought this through very deeply yet and my understandings
of the issues are still quite naive, so I will be very interested to
hear any comments. Sorry if this avenue has already been explored, or
I've misunderstood the mechanics of Defuddle or JAXME.

Cheers,
Tom





More information about the dfdl-wg mailing list