[dfdl-wg] How to handle multi-dimensional arrays - version 2

Mon Mar 7 09:12:35 CST 2005

Ok, so I guess your point I can summarize like this:

MD arrays shouldn't need special treatment. We should be able to compose
what we need from lower levels that have only a small number of primitives.
Rich functionality must be able to be built up from this. If we can't so
this, that is, if the bottom level has to have lots of elaborate constructs,
then something is wrong with the composition model.

I'll give this some more thought.

...mikeb

> -----Original Message-----
> From: owner-dfdl-wg at ggf.org [mailto:owner-dfdl-wg at ggf.org] On 
> Behalf Of Myers, James D
> Sent: Thursday, March 03, 2005 5:21 PM
> To: dfdl-wg at gridforum.org
> Subject: RE: [dfdl-wg] How to handle multi-dimensional arrays 
> - version 2
> 
> I sort-of agree :-) I think the distinctions I'm making are 
> subtle, but important with repect to composability/layers, 
> but don't shift what you can do in the multidimensional array 
> case from where you're trying to go. And if that doesn't make 
> your eyes cross and cause fits, read on...
> 
> Why haven't you haven't felt it necessary to define an XSD 
> for vectors beyond putting dfdl:runtimeoccurs limits on how 
> many to pull from a stream? In this case, the runtimeoccurs 
> param is a param of the reader that populates a 'normal' XSD 
> sequence with 'normal' XSD elements. For multidimensional 
> arrays, the runtimeoccurs parameters for each dimension are 
> now becoming part of the model rather than parameters of the reader.
> I don't know if I like that, but, if we do it, why not do it 
> everywhere and make, for example,  dfdl:byteorder an 
> attribute on all ints and floats that are read? Of course, a 
> byteorder attribute would only be available if you actually 
> came from a binary stream, which may be defined elsewhere 
> (some enclosing node, another layer). To me, any of this 
> starts to mix the model and the method used to read the 
> model, which gets back to the issue of how independent are 
> readers/ does creating a new reader imply creating a new subtype, etc.
> 
> I guess I'd rather see the concept of multidimensional arrays as
> follows: there is not, in fact a multidimensional array on 
> disk/stream, just a serialized sequence of 
> ints/floats/whatever. But, to assist the user in interpreting 
> this flat data as a multidimensional array, we want DFDL to 
> make index info available and, rather than just making a 
> single cursor count available and requiring users to do math 
> to have indexes that don't start at 0 (or 1 - whatever) or 
> support multiple dimensions, we provide some convenience 
> mechanisms that can report an index or indexes that cycle 
> from user defined mins and maxes as elements are read, which 
> can be used to decorate elements with attributes or be used 
> in conditional logic, etc. This would preserve the separation 
> of reader from model, at the expense of saying that indexes 
> like this are different/ are not like all the dfdl reader 
> parameters that might be in the current context.
> 
> 
> > What makes all this confusing for DFDL is that we have some 
> > representations that are complex enough to need layered multi-step 
> > descriptions, and once you have that, there's no stopping you from 
> > using it to do all sorts of transformation from one format 
> to another. 
> > So it feels like you can have your cake and eat it too, which is to 
> > say you can pick your XML Schema and populate it from quite 
> > differently structured data. And that is probably true, but at the 
> > bottom level of the stack of layers you have to have a 
> vocabulary and 
> > model for directly describing the structure of the data so 
> as to get 
> > the whole ball rolling. And at this bottom layer, the needs of 
> > describing the data format completely dictate what the 
> schema is like.
> 
> I would solve this by just saying that, at the bottom layer, 
> there are no single or multidimensional arrays, just 
> sequences of base types, and that any concept of dimensions 
> is fabrication created by the user (a very common and 
> convenient one we might want special support for...).
> 
> The only reason I think we would need a multidimensional 
> array type in DFDL is if we wanted to directly read m*n bytes 
> and create a single XML element representing the entire array 
> that would then have some accessor methods to get a value for 
> a particular x,y offset pair. I'm not sure what kind of 
> analogy will make sense to people here, but I see a similar 
> argument for floats from strings: if you want to create a 
> float from a sequence of characters, you need a float type. 
> If you just want to prescribe a standard way to model a 
> sequence of characters representing a float so that we can 
> consistently label the mantissa and exponent chars regardless 
> of storage order, you're not really defining a new float type 
> in XSD. Instead, your exposing the semantics created/inferred 
> by the reader as standardized annotations of the existing char (or
> string) type (with the annotations being potential or 
> required depending on whether you let me shut them off or not).
> 
>   Jim
> 
>