[dfdl-wg] How to handle multi-dimensional arrays

Myers, James D jim.myers at pnl.gov
Wed Mar 2 15:43:30 CST 2005


Mike,

Some random quick comments - hopefully not to terse or
stream-of-conciousness to be incomprehensible:

This approach seems to dictate how the user will represent the array in
XML (dictating their schema) rather than just describing how to pick up
the right content. Which I think we agreed is a bad thing. That's not to
say this isn't a reasonable way to represent a multidim array in XML,
just that having DFDL go look at the attributes outside the annotation
and requiring this formatting in the output XML doesn't seem right. 

In
                            <dfdl:dataFormat arrayStorageOrder="@y @x">

it seems that DFDL only needs the dimension sizes for its own purposes
and we could probably use our referencing mechanism to get them (i.e. if
I read the two dimension ints earlier and want to reference them for the
array sizes) - maybe some kind of <dfdl:runtimeoccurs> elements for the
n dimensions?
  
To allow the kind of output XML you propose, we probably need something
new to allow you to loop. If the only place we need looping is for
multidimensional arrays (and the special case of one dim), perhaps we
can do something similar to what you propose and essentially have the
array mechanism define some loop variables that can be referenced (a
dfdl layer?). I don't have a full proposal thought out, but imagine
defining an array as a stream that can be referenced using multiple
dimensions rather than a single cursor, and having a mechanism so that
the current value of the cursor(s) are available to the user.

So, from the Reference.xsd example, we might want to have an attribute
that shows the x value of the xdata elements analogous to the multidim
example:

                <xs:element name="xdata" type="xs:float"
maxOccurs="unbounded">
                    <xs:annotation>
                        <xs:appinfo>
 
<dfdl:runtimeoccurs>../x</dfdl:runtimeoccurs>
                        </xs:appinfo>
                    </xs:annotation>
                    <xs:attribute name="x">
                       <dfdl:runtimevalue = #currentcursorvalue#/>
                    </xs:attribute>
                </xs:element>

where #currentcursorvalue# is something we have not yet made available
for output (or is this available via xpath - the position of the current
element in a sequence?). This would change the Reference.xml example
output to have elements like

  <xdata x="1">2.78</xdata> 
  <xdata x="2">3.14</xdata>

   
So, if I can summarize/rephrase, I think we should keep the mechanism
for single or multidim arrays separate from how the output is displayed,
but I like the idea of making the current cursor(s) available for use,
which I don't think we've done yet. And having a real multidimension
construct rather than calculating them from a flat cursor is probably a
requirement for scientific use, so some multidim analog of
dfdl:runtimeoccurs is needed.

  Jim
 
 -----Original Message-----
From: owner-dfdl-wg at ggf.org [mailto:owner-dfdl-wg at ggf.org] On Behalf Of
mike.beckerle at ascentialsoftware.com
Sent: Friday, February 18, 2005 4:22 PM
To: dfdl-wg at gridforum.org
Subject: [dfdl-wg] How to handle multi-dimensional arrays



We have come up with an approach to how to represent multi-dimensional
arrays within XSD-described XML. The attached test file (.xml) and DFDL
Schema (.dfdl.xsd) illustrate the proposed solution.

The proposal does not require any changes to XSD, XML or any other
special constructs outside of a single dfdl annotation to specify the
storage order of the representation.

I'm pretty happy with how this works out. We can handle arrays with
different storage orders, like fortran style column-major vs. more
common row-major, and it dovetails nicely with XPath expressions and the
XSD data model. Schema validation can really do something for you, like
tell you if you have all the elements of the array (if it's fixed size),
and that you don't have multiple elements occupying the same array
location. 

Those interested in multi-dimensional array support please give this
some consideration. 

That said, I'm departing on vacation for a week, so I'll toss this out
there for people to look at, but I won't be able to interact with you
all on it until I get back. 

...mikeb





More information about the dfdl-wg mailing list