[dfdl-wg] Issues: additional data types

Mon Sep 5 08:51:05 CDT 2005

Hi Robert,

As I said in my other reply to you, one of the features that we decided 
upon with DFDL was that the model that we were translating into was the 
XML data model. So any file described by DFDL should be translatable 
into a well formed XML document.

The XML data model may not be ideal but it is the standard data model.

The reason that the types you describe are not in DFDL is that they are 
not in XML.

That is not to say they are not important or should not be addressed.

My opinion on these is that they can be built out of the existing 
DFDL/XML components and that this is the correct way of handle them. The 
standard should provide a document that describes one or more ways in 
which these types can be achieved.

More inline...

Robert E. McGrath wrote:
..snip...

> *1. Enum*
> 
> This type has a set of <name, value> pairs, e.g., <�Red�, 0>, <�Blue�, 
> 1>, etc. The values are stored in the data, with the name-value pairs 
> stored in metadata.
> 
>  
> 
> Note: one use is for localization, using different maps to give 
> localized strings.
> 
>  
> 
> *Difficulty*: Low
> 
> *Priority:*��� Low

I'm confused about what you want to achieve. If you only store the value 
(an integer) what is the function of the name, is it just for human 
beings reading the file, or is there some way it is used programatically?

One way to approach this type would be a choice over a series of tags 
with appropriate aattribute types constrained to hardwired values:

<enum name="Red" value="0"/>

(Complex type with two attributes, one a string constrained to the 
single value "Red" the other an integer constrained to the value 0).

You could then use the DFDL annotations to ensure that this tag gets 
picked when zero occurs in the file.

> *2. Opaque (tagged)*
>
> This is some kind of non-numeric bit string, with a length and some kind 
> of tag. 
> 
> This might be used, for example, for 1024-bit encryption keys.� The type 
> means �just pass through the bits�. 
> 
> Generally, can be used to store any kind of �blob�, which can be objects 
> that are meaningful to specific software.
> 
>  
> 
> This can be simulated with unsigned integers, but it may be useful to 
> know that it is not really an integer, or whatever.
> 
>  
> 
> *Difficulty*: Low
> 
> *Priority:*��� Low

This is just a sequence of bytes... it may need to be hidden (the 
layering introduces a requirement for it to be possible to be explicit 
about what is visible at a particular layer). (I don't know what the 
current favourite way to do this is).

> *3. �Code�*
> 
>  
> 
> How should �code� be marked up?� It is usually stored in blobs, but it 
> needs a tag so you know how to interpret it.
> 
>  
> 
> This is actually a special case of �opaque�.
> 
>  
> 
> *Difficulty*: Low
> 
> *Priority:*��� Low

I don't understand what you mean.

> 
> *4. Bitfield / packed*
> 
>  
> 
> This type is bits packed into bytes.
> 
>  
> 
> *Difficulty*: Low
> 
> *Priority:*��� Low

Agreed. I think we can do this but it should be easier.

> *5. Pointer� *
> 
>  
> 
> Many times there will be pointers within the data, e.g., to offsets in 
> the file, or to indexes in an array.� This will be critical for storing 
> objects such as lists or trees.
> 
>  
> 
> URL�s� and XPATHS are not especially well suited for this.
> 
>  
> 
> This can be simulated with unsigned integers, but they need to be 
> �swizzled� when translating, so they need to be tagged.
> 
>  
> 
> Note that there might be several types of addressing within the data:
> 
> �        Offset from zero
> 
> �        Offset relative to �foo�
> 
> The offsets might be in different increments:� bits, bytes, words, 
> elements, etc.
> 
>  
> 
> There could be multi-part addresses, e.g., page + offset in page.
> 
>  
> 
> *Difficulty*:�� Medium
> 
> *Priority: *������High

I spent a long time thinking about pointers at one time. I was unable to 
come up with anything I felt covered the bases. Perhaps you have enough 
experience with pointer representations to help us out here.

An interesting problem that comes to mind though is what is the XML 
representation of the pointer value.

If it is a tag like:

<pointer offset="20" offsetType="bytes" index="5" indexType="float32"/>

then all that is needed is to define the metadata conventions that allow 
that to be correctly interpreted.

This is a little unsatisfactory though...

> *6. Array*
> 
>  
> 
> This is a critical type, must be supported.
> 
>  
> 
> There are a lot of issues.
> 
>  
> 
> I am preparing a separate memo.
> 
>  
> 
> *Difficulty*:�� High
> 
> *Priority: *�����Very High

We have talked a lot about arrays. A big issue that there are several 
ways you may want to represent an array within your XML data model. IMO 
the right way will depend on how you want to use the data.

I think the right way to do this is to have a series of recipes for 
users to capture array semantics in their DFDL files.

Cheers,

Martin