[dfdl-wg] How to handle multi-dimensional arrays - version 2

Myers, James D jim.myers at pnl.gov
Thu Mar 3 09:38:07 CST 2005


> 
>  
> Here's a slightly different formulation of the multi-dimension stuff.
> 
> 1) no longer dictates the XSD for representing the array. 
> This cuts both ways since you no longer really have an XSD 
> model for multi-dimensional arrays. That is. It is up to the 
> author of the DFDL Schema to insure the needed information 
> about the array (coordinates of each element) make it in to 
> the logical model in a useful way.

I didn't realize we were proposing to extend the XML schema to have a
multidimensional array type, versus providing a way for DFDL to read and
internally represent a multidimensional array. The latter seems
descriptive and the former prescriptive.

> 
> 2) I added in the complexity of calculating the array size, 
> actually the lower and upper bounds of each dimension, 
> dynamically based on data. This makes the example more real.
> 
> This still works out pretty well. I'm still pondering whether 
> I like this better or not. I'm thinking about perhaps some 
> sort of pseudo attributes which are guaranteed to be put into 
> XML if you actually render to XML, but where a DFDL API-based 
> implementation can choose not to realize them. 

I think this example removes some of the prescriptive nature of the
first one, but I'd like to be able to format my array however I want,
e.g. as

<row><elem>3</elem><elem>2</elem></row>
<row><elem>5</elem><elem>6</elem></row>
...

Or even

<states><state>Alabama</state><state>Alaska</state></states>
<population><pop>34.2</pop><pop>10.6</pop></population>
....

(an array containing state names, population and other data, perhaps
serialized in the file as all info for each state together).

If DFDL could separate the reading of such an array from how it is
output in the schema, I could do any of this. Having multiple layers is
a start - DFDL reads the array in to something that is addressable along
the lines Mike proposes and then the contents of that layer are
referenced via xpath to provide values in some structure I define in
XSD. The only piece missing (I think) is that we haven't yet defined how
to access iterators, i.e. if I have an element <elem minoccurs="1"
maxoccurs = "5"> , how can I say that element n (n = 1...5) has
dfdl:runtimevalue <a x="n" y="1">, which would put just the first column
of a into the element sequence. If, in Mike's example, I could define
the x and y dimensions independent of an array-reading context, just so
I can use them in value references for dfdl:runtimevalue elements, I
think we'd be all set.

This type of capability would allow all sorts of useful things -
including the array to set of vectors conversions outlined here as well
as subsampling, expansion/contraction of sprase arrays (where the array
is stored as a sequence of x,y, value triples for only nonzero
elements), etc.


One other minor point - if the order of x and y in the DFDL file is
important (as it is in the example), do we need a <dfdl:array
storageOrder="firstDimensionChangesFirst"> option? OR can we just list y
first and then x?

  Jim
 





More information about the dfdl-wg mailing list