[dfdl-wg] How to deal with variable length elements?

Fri Mar 17 13:14:10 CST 2006

Hmmm.

I think layered value calculation formulas which allow for a magic 
"myIndex" variable are perhaps an important device to make this class of 
layering possible.This makes the iteration over the elements implicit.

I have one example which is the one where there is a vector of strings 
where the lengths of all the strings are stored first, separately from all 
the character data.

A formula involving "myIndex" is used to glue the two pieces together.

Here's the example as per our prototype from last summer:

...mikeb

"Robert E. McGrath" <mcgrath at ncsa.uiuc.edu> 
Sent by: owner-dfdl-wg at ggf.org
03/17/2006 10:54 AM

To
dfdl-wg at ggf.org
cc

Subject
Re: [dfdl-wg] How to deal with variable length elements?

Following up on my email on ealier this week:

I think there was a major flaw in what I wrote, and it is quite an
"interesting" challenge.

Let me review:

I am thinking about how to describe reading data into a 1D array. Steve
provided a markup for the XML element. 

The challenge I'm looking at is that the data need not be a image of
the memory layout.  To give one example, a very sparse array might be
stored as a series of (index, value) pairs for the non-empty places,
all others implied to be zero or fill or whatever.

The goal is to have the XML array be fully populated from this sparse
form--or whatever layout--on disk.  (Please assume for now that this is a 
reasonable goal!)

The XML and DFDL will tell us the data type, and presumably we know
the extent of the data on disk.  But we need to decode the storage
to generate all the elements values and fills.

In my earlier email, I offered a description that included an 'Iterator'
conversion.  I now think this is inadequate.  In fact you need two
cooperating 'Iterators'!  Ick!

Here is my revised pipeline.  Data is read from bottom to top. I
sketch what each conversion is tasked to do.  I think the 'Decoder'
needs to know info from both the 'Iterator' (it asks for each element
in the order it wants them) and 'Float' (it tells the size of
the 'value' to get).

==
  <<XML element with multiple occurs: 1D array >>

        ^
        |

  Iterator conversion:  relevant props: minOccurs="0" 
maxOccurs="<<setting>>, 
et al
             Get 'maxOccurs' elements of type datatype.

        ^
        |

  Float conversion:     relevant props:  data description of element
             Decode bytes

        ^
        |

  Decoder conversion:  produces the bytes 'nth' _value_ in the array. 
          Input: what position is needed.
                  may need separators and other props: depends on encoding
          Output: sizeof datatype bytes, the _value_
          Side effect: after whole array is read, consumes all the
               storage.  Difficult to characterize the intermediate
               state.

        ^
        |

  Data:  read as bytes

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20060317/299e2947/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testArrayOfStringWithAllLengthsFirst.dfdl.xsd
Type: application/octet-stream
Size: 4234 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20060317/299e2947/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testArrayOfStringWithAllLengthsFirst.xml
Type: application/octet-stream
Size: 770 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20060317/299e2947/attachment-0001.obj