[dfdl-wg] How to deal with variable length elements?

Robert E. McGrath mcgrath at ncsa.uiuc.edu
Wed Mar 15 10:06:40 CST 2006


Following Steve's sketch, how should this be represented as conversions?

This is a bit different than other examples because there is a "for each"
operation here.

Perhaps this could be abstractly viewed as:

==
  <<XML element with multiple occurs, as shown yesterday>>

  Iterator conversion:  relevant props: minOccurs="0" maxOccurs="<<setting>>, 
et al

  Float conversion:     relevant props:  data description of element

  Data:  read as bytes
==

Here is why I want to break out the "for each" as a separate operation.

We will need to deal with the case where the stored data is not necessarily 
memory 
image of the desired XML element.  I.e., the numbers might be in alternative 
order, 
or might have implied values not in the data, or might not be contiguous in 
the
storage.  

In these cases, we want to substitute a alternative "Iterator" that
understands where to find (or how to compute) the 'nth' element.  But 
we want to use the float conversion for each element.

If we can support this, then we can let users deal with whatever 
clever storage schemes might be used, to generate a 1D array
of elements in a known order. The latter can be used for further
conversions or by XSL in a portable and generic way.

On Tuesday 14 March 2006 10:31, Steve Hanson wrote:
> Hi Bob, quite timely as I've been looking at DFDL array properties but
> limiting myself to 1-dim only, just to establish the basics. I've annotated
> your XML below with DFDL annotations to describe the array. I'm assuming
> that there is no markup (separators etc) in your format, ie, it is just an
> integer followed by floats. I fully realise that my scheme does not handle
> multi-dim or sparse arrays.
>
> Your maxOccurs I've corrected to "unbounded". Remember that XML can always
> tell the number of items in an array, by using the tags, hence there is
> never any need to include a count in XML.
>
> +++
>
> <!-- global type -->
> <xs: element type="float" name="floatType" \>
>
> <!-- use global type to read  -->
> <xs:complexType name="floatArray">
>     <xs:sequence>
>         <xs:element name="nelems type="int" />
>             <xs:annotation><xs:appinfo source="http://dataformat.org">
>                   <dfdl:element repType="binaryInteger" signed="false"
> lengthKind="fixed" length="4" />
>             </xs:appinfo></xs:annotation>
>         <xs:element name="x" ref="floatType" minOccurs="0"
> maxOccurs="unbounded" />
>             <xs:annotation><xs:appinfo source="http://dataformat.org">
>                   <dfdl:element repType="binaryFloat"
> floatType="IEEEExtendedIntel" lengthKind="fixed" length="4"
>                                 occursDeterminedBy="xpath"
> occursPath="./nelems" />
>             </xs:appinfo></xs:annotation>
>     </xs:sequence>
> </xs:complexType>
>
> +++
>
> Here's a second variation where the number of occurrences is fixed. If so
> we assume maxOccurs holds the actual number. (That's up for debate, maybe
> we need a separate DFDL occurs count independent of min/maxOccurs?).
>
> +++
>
> <!-- use global type to read  -->
> <xs:complexType name="floatArray2">
>     <xs:sequence>
>         <xs:element name="x" ref="floatType" minOccurs="0" maxOccurs="10"
> />
>             <xs:annotation><xs:appinfo source="http://dataformat.org">
>                   <dfdl:element repType="binaryFloat"
> floatType="IEEEExtendedIntel" lengthKind="fixed" length="4"
>                                 occursDeterminedBy="maxOccurs" />
>             </xs:appinfo></xs:annotation>
>     </xs:sequence>
> </xs:complexType>
>
> +++
>
> Here's a third where the number of occurrences is given by a terminating
> value.
>
> +++
>
> <!-- use global type to read  -->
> <xs:complexType name="floatArray3">
>     <xs:sequence>
>         <xs:element name="x" ref="floatType" minOccurs="0"
> maxOccurs="unbounded" />
>             <xs:annotation><xs:appinfo source="http://dataformat.org">
>                   <dfdl:element repType="binaryFloat"
> floatType="IEEEExtendedIntel" lengthKind="fixed" length="4"
>                                 occursDeterminedBy="value"
> occursTerminatingValueKind="logical"
>                                 occursTerminatingValue="-99999"/>
>             </xs:appinfo></xs:annotation>
>     </xs:sequence>
> </xs:complexType>
>
> +++
>
> Here's the definition of DFDL occursDeterminedBy, which seems to me to
> capture all the possibilities for establishing the number:
>
> "Enum. Valid values ‘maxOccurs’, ‘xpath’, ‘value’, ‘markup’.
> Specifies how the actual number of occurrences is to be established.
> ‘maxOccurs’ means use the value of maxOccurs, ‘xpath’ means use the value
> of a named field earlier in the data, ‘value’ means there is a special
> terminating value, ‘markup’ means that separators and/or initiators dictate
> the number."
>
>
> Regards, Steve
>
> Steve Hanson
> WebSphere Message Brokers,
> IBM United Kingdom Ltd, Hursley, UK
> Internet: smh at uk.ibm.com
> Phone (+44)/(0) 1962-815848
>
>
>
>              "Robert E.
>              McGrath"
>              <mcgrath at ncsa.uiu                                          To
>              c.edu>                    dfdl-wg at ggf.org
>              Sent by:                                                   cc
>              owner-dfdl-wg at ggf
>              .org                                                  Subject
>                                        [dfdl-wg] How to deal with variable
>                                        length elements?
>              14/03/2006 15:25
>
>
>
>
>
>
>
>
>
> Folks,
>
> I'm trying to build up a story about how to handle arrays using DFDL.
>
> But first, I need to check if I understand the basics.
>
> Here is an example of a 1D array in XML, modeled as two elements,
> an integer indicating how many elements, followed by an array of zero
> or more floats.
>
> Looking at my XML textbooks, the following seems like the correct XML
> schema for this notion.
>
> +++
>
> <!-- global type -->
> <xs: element type="float" name="floatType" \>
>
> <!-- use global type to read  -->
> <xs:complexType name="floatArray">
>     <xs:sequence>
>         <xs:element name="nelems type="int" />
>         <xs:element name="x" ref="floatType" minOccurs="0"
> maxOccurs="./nelems" />
>     </xs:sequence>
> </xs:complexType>
>
> +++
>
> Do I have this correct? (I'm pretty sure the 'maxOccurs' isn't correct,
> so I hope someone will tell me the right way to do this.)
>
> If so, the next question will be "how do I annotate this with DFDL?",
> e.g., when the data is precisely one binary (or text encoded) int, followed
> by some binary (or text encoeded) floats.

-- 
---
Robert E. McGrath, Ph.D.
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
1205 West Clark
Urbana, Illinois 61801
(217)-333-6549

mcgrath at ncsa.uiuc.edu





More information about the dfdl-wg mailing list