[dfdl-wg] How to deal with variable length elements?

Steve Hanson smh at uk.ibm.com
Tue Mar 14 10:31:26 CST 2006


Hi Bob, quite timely as I've been looking at DFDL array properties but
limiting myself to 1-dim only, just to establish the basics. I've annotated
your XML below with DFDL annotations to describe the array. I'm assuming
that there is no markup (separators etc) in your format, ie, it is just an
integer followed by floats. I fully realise that my scheme does not handle
multi-dim or sparse arrays.

Your maxOccurs I've corrected to "unbounded". Remember that XML can always
tell the number of items in an array, by using the tags, hence there is
never any need to include a count in XML.

+++

<!-- global type -->
<xs: element type="float" name="floatType" \>

<!-- use global type to read  -->
<xs:complexType name="floatArray">
    <xs:sequence>
        <xs:element name="nelems type="int" />
            <xs:annotation><xs:appinfo source="http://dataformat.org">
                  <dfdl:element repType="binaryInteger" signed="false"
lengthKind="fixed" length="4" />
            </xs:appinfo></xs:annotation>
        <xs:element name="x" ref="floatType" minOccurs="0"
maxOccurs="unbounded" />
            <xs:annotation><xs:appinfo source="http://dataformat.org">
                  <dfdl:element repType="binaryFloat"
floatType="IEEEExtendedIntel" lengthKind="fixed" length="4"
                                occursDeterminedBy="xpath"
occursPath="./nelems" />
            </xs:appinfo></xs:annotation>
    </xs:sequence>
</xs:complexType>

+++

Here's a second variation where the number of occurrences is fixed. If so
we assume maxOccurs holds the actual number. (That's up for debate, maybe
we need a separate DFDL occurs count independent of min/maxOccurs?).

+++

<!-- use global type to read  -->
<xs:complexType name="floatArray2">
    <xs:sequence>
        <xs:element name="x" ref="floatType" minOccurs="0" maxOccurs="10"
/>
            <xs:annotation><xs:appinfo source="http://dataformat.org">
                  <dfdl:element repType="binaryFloat"
floatType="IEEEExtendedIntel" lengthKind="fixed" length="4"
                                occursDeterminedBy="maxOccurs" />
            </xs:appinfo></xs:annotation>
    </xs:sequence>
</xs:complexType>

+++

Here's a third where the number of occurrences is given by a terminating
value.

+++

<!-- use global type to read  -->
<xs:complexType name="floatArray3">
    <xs:sequence>
        <xs:element name="x" ref="floatType" minOccurs="0"
maxOccurs="unbounded" />
            <xs:annotation><xs:appinfo source="http://dataformat.org">
                  <dfdl:element repType="binaryFloat"
floatType="IEEEExtendedIntel" lengthKind="fixed" length="4"
                                occursDeterminedBy="value"
occursTerminatingValueKind="logical"
                                occursTerminatingValue="-99999"/>
            </xs:appinfo></xs:annotation>
    </xs:sequence>
</xs:complexType>

+++

Here's the definition of DFDL occursDeterminedBy, which seems to me to
capture all the possibilities for establishing the number:

"Enum. Valid values ‘maxOccurs’, ‘xpath’, ‘value’, ‘markup’.
Specifies how the actual number of occurrences is to be established.
‘maxOccurs’ means use the value of maxOccurs, ‘xpath’ means use the value
of a named field earlier in the data, ‘value’ means there is a special
terminating value, ‘markup’ means that separators and/or initiators dictate
the number."


Regards, Steve

Steve Hanson
WebSphere Message Brokers,
IBM United Kingdom Ltd, Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848


                                                                           
             "Robert E.                                                    
             McGrath"                                                      
             <mcgrath at ncsa.uiu                                          To 
             c.edu>                    dfdl-wg at ggf.org                     
             Sent by:                                                   cc 
             owner-dfdl-wg at ggf                                             
             .org                                                  Subject 
                                       [dfdl-wg] How to deal with variable 
                                       length elements?                    
             14/03/2006 15:25                                              
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Folks,

I'm trying to build up a story about how to handle arrays using DFDL.

But first, I need to check if I understand the basics.

Here is an example of a 1D array in XML, modeled as two elements,
an integer indicating how many elements, followed by an array of zero
or more floats.

Looking at my XML textbooks, the following seems like the correct XML
schema for this notion.

+++

<!-- global type -->
<xs: element type="float" name="floatType" \>

<!-- use global type to read  -->
<xs:complexType name="floatArray">
    <xs:sequence>
        <xs:element name="nelems type="int" />
        <xs:element name="x" ref="floatType" minOccurs="0"
maxOccurs="./nelems" />
    </xs:sequence>
</xs:complexType>

+++

Do I have this correct? (I'm pretty sure the 'maxOccurs' isn't correct,
so I hope someone will tell me the right way to do this.)

If so, the next question will be "how do I annotate this with DFDL?",
e.g., when the data is precisely one binary (or text encoded) int, followed
by some binary (or text encoeded) floats.



More information about the dfdl-wg mailing list