[DFDL-WG] lengthKind='prefixed' clarification needed

Mike Beckerle mbeckerle.dfdl at gmail.com
Sun Oct 23 20:15:25 CDT 2011


Definintely need an agenda slot to discuss this matter.

I think we should redefine PrefixLength to allow it to have framing.

There is a significant issue which is that some prefixLengthTypes will be
multi-byte binary integers (typically 2 or 4 bytes), and these commonly
require alignment to a 2 or 4 byte boundary, as that's how the data
structures they live in would have been laid out.

The spec currently doesn't allow prefixLengthTypes to be aligned themselves,
because the grammar has them as SimpleContent, without the surrounding
ElementLeftFraming and RightFraming. This is also why they cannot have
lengthKind='delimited'. Because there are no initiator nor terminator
regions surrounding them. So the only way they can be aligned is if the
elements that have these prefixLengths are themselves aligned properly.

However, if you specify alignment on a simple integer type, use it as a
prefixLengthType, and then that alignment annotation is *ignored* that would
seem strange, and buggy/hard-to-diagnose.

However, scoping rules for properties don't provide any way for this
alignment to get "into the scope chain", and I'd hate to start messing with
the scoping rules because of the corner case of prefixLength. We'd need to
put another scoping rule in just to handle this. I'd rather not go there.
Lots of our examples in the spec would have to change as they use alignment
as the example property...

But, the spec is not self-consistent, as the dfdl:alignment property can be
placed on a simple type definition, as well as on an element. So it would
seem a prefixLengthType could reference an aligned simple integer type, but
neither the grammar nor the scoping rules allow for using this alignment
property to control anything.  Similarly, you can put an initiator on a
simple integer type, use it as a prefixLengthType, and have the initiator be
ignored.... because there is no initiator region for a PrefixLength.

We need to fix this inconsistency.

I think prefixLengthType needs to be alignable, and one should be able to
specify alignment on a  type definition, not just on an element.

I also think we're better off with a uniform general fix here, than a
handful of special case rules around prefix lengths. (E.g., the
prefixLengthType cannot have alignment, cannot have initiator/terminator or
lengthKind delimited warning or SDE if it does, etc. etc.)

So I think the grammar is wrong. I think

PrefixLength = SimpleElement

(where SimpleElement = ElementLeftFraming SimpleContent RightFraming )

is the right definition.

In working through examples, I'm convinced the current spec is problematic.
In the current spec one must model a 4-byte aligned binary integer prefix
length as a separate element (so that you can align it), and use
lengthKind='explicit' on the thing it controls. This is a lot of hassle for
a very common situation. The whole point of dfdl:lengthKind='prefixed' is to
provide an easier way to model the common cases.

For the same reason there is no alignment, the definition of
dfdl:prefixLengthType says the named type cannot have
lengthKind='delimited'. That is because the DFDL grammar defines the
prefixLength region to be SimpleContent which is without any of the
surrounding framing regions where delimiters are found.

So, one cannot for example, put an initiator and terminator on the prefix
length type so as to have syntax separating it from the actual content. Even
if it is fixed length you can't do it - Like you cannot model this data as 3
string elements using prefix length:

(11)9 Ocean Way(20)Southwest(SW) Harbor(02)ME

(Notice in the above the unescaped "(SW)", which is why this is not a
delimited format.)

You also cannot do:

11(9 Ocean Way)20(Southwest(SW) Harbor)02(ME)

because that puts the initiator of the string element itself after its
prefix length region, which is backwards from the way we have it in the
grammar currently. Both of the examples above require use of a separate
element and lengthKind="explicit" to pull off, even though they seem like
fairly natural ways to textualize a binary format.

Now consider

xx9 Ocean WayxxSouthwest(SW) HarborxxME

where the "xx" is a 16 bit (2 byte) binary integer holding the lengths 11,
20, and 2 respectively.

Except....That is, so long as the "xx" doesn't need to be on a 2-byte
alignment, because in my example the first element occupies 13 bytes
including the prefix itself, so the next "xx" starts on an odd boundary.  I
could specify alignment on each of the 3 elements of my sequence here, which
is unmotivated/weird since they're string elements and their type may be
distant from where the elements are declared, so the motivation for the
alighment may not be clear....... the alignment constraint really wants to
be expressed on the prefixLengthType, and the dfdl annotation syntax lets
you specify alignment there, ... it just doesn't use it.

If we just redefine PrefixLength as SimpleElement, now all the example
formats above are easily modeled in the obvious way, and even the
combinations of text and binary lengths can be done naturally, as a binary
prefixLengthType integer type can have all the usual constraints binary data
likes to have, like alignment.

Even the 2-level ASN.1 wierd case "prefix-length of the prefix-length" (see
errata 2.13) works because ElementLeftFraming itself includes PrefixLength.
I believe we should put an explicit depth limit of 2 on this however.

(Side note: I'd like to see an example of the ASN.1 format that supposedly
requires this nested prefix of the prefix situation.)

Changing the grammar in this way lets us drop the special case handling
around prefixLength where it can't have lengthKind="delimited" and ignores
initiators and terminators and alignment which is a bunch less special cases
to have to implement and test, and create special warnings for (e.g.,
"Warning: prefixLengthType 'lenType' has alignment property which will be
ignored.")

If we want to be more minimal about the changes, just changing

PrefixLength = ElementLeftFraming SimpleContent RightFraming

is sufficient and achieves the fix of the real problem.

(This also eliminates the need for current errata 2.13 and 2.14, or rather
replaces those errata with this new stuff.)

...mikeb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20111023/4c34d760/attachment.html 


More information about the dfdl-wg mailing list