[DFDL-WG] lengthKind='prefixed' clarification needed

Steve Hanson smh at uk.ibm.com
Wed Oct 26 05:15:19 CDT 2011


Mike

Length kind 'prefixed' was intended to handle the case where the length is 
tightly bound to the data, ie, there is nothing between the length and the 
data. For example a PL/1 var char or ASN.1 BER.  If the length causes the 
length/data to be aligned then that has to be taken into account on the 
element itself.   Length kind 'prefixed' was not intended to cover more 
complex cases where the length itself has independent alignment or there 
are delimiters involved. For those you use length kind 'explicit' and an 
expression. Otherwise the combinations become too complicated. If we wish 
to extend 'prefixed' to include the more complex cases, I think that is a 
post 1.0 thought and is best handled using a different length kind enum.

You say that ignoring the alignment property on the simple type used for 
the length is strange, but if you allow that there is no way to align the 
element's actual data separately. I think that it is even stranger. 
 
The ASN.1 BER description at 
http://en.wikipedia.org/wiki/Basic_Encoding_Rules describes how the length 
itself can have a prefix (see sub-section 'Length').

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:
Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:
Tim Kimber/UK/IBM at IBMGB, dfdl-wg at ogf.org
Date:
24/10/2011 02:16
Subject:
Re: [DFDL-WG] lengthKind='prefixed' clarification needed
Sent by:
dfdl-wg-bounces at ogf.org


Definintely need an agenda slot to discuss this matter.

I think we should redefine PrefixLength to allow it to have framing.

There is a significant issue which is that some prefixLengthTypes will be 
multi-byte binary integers (typically 2 or 4 bytes), and these commonly 
require alignment to a 2 or 4 byte boundary, as that's how the data 
structures they live in would have been laid out.

The spec currently doesn't allow prefixLengthTypes to be aligned 
themselves, because the grammar has them as SimpleContent, without the 
surrounding ElementLeftFraming and RightFraming. This is also why they 
cannot have lengthKind='delimited'. Because there are no initiator nor 
terminator regions surrounding them. So the only way they can be aligned 
is if the elements that have these prefixLengths are themselves aligned 
properly. 

However, if you specify alignment on a simple integer type, use it as a 
prefixLengthType, and then that alignment annotation is *ignored* that 
would seem strange, and buggy/hard-to-diagnose.

However, scoping rules for properties don't provide any way for this 
alignment to get "into the scope chain", and I'd hate to start messing 
with the scoping rules because of the corner case of prefixLength. We'd 
need to put another scoping rule in just to handle this. I'd rather not go 
there. Lots of our examples in the spec would have to change as they use 
alignment as the example property...

But, the spec is not self-consistent, as the dfdl:alignment property can 
be placed on a simple type definition, as well as on an element. So it 
would seem a prefixLengthType could reference an aligned simple integer 
type, but neither the grammar nor the scoping rules allow for using this 
alignment property to control anything.  Similarly, you can put an 
initiator on a simple integer type, use it as a prefixLengthType, and have 
the initiator be ignored.... because there is no initiator region for a 
PrefixLength. 

We need to fix this inconsistency.

I think prefixLengthType needs to be alignable, and one should be able to 
specify alignment on a  type definition, not just on an element. 

I also think we're better off with a uniform general fix here, than a 
handful of special case rules around prefix lengths. (E.g., the 
prefixLengthType cannot have alignment, cannot have initiator/terminator 
or lengthKind delimited warning or SDE if it does, etc. etc.)

So I think the grammar is wrong. I think 

PrefixLength = SimpleElement

(where SimpleElement = ElementLeftFraming SimpleContent RightFraming )

is the right definition. 


In working through examples, I'm convinced the current spec is 
problematic. In the current spec one must model a 4-byte aligned binary 
integer prefix length as a separate element (so that you can align it), 
and use lengthKind='explicit' on the thing it controls. This is a lot of 
hassle for a very common situation. The whole point of 
dfdl:lengthKind='prefixed' is to provide an easier way to model the common 
cases.

For the same reason there is no alignment, the definition of 
dfdl:prefixLengthType says the named type cannot have 
lengthKind='delimited'. That is because the DFDL grammar defines the 
prefixLength region to be SimpleContent which is without any of the 
surrounding framing regions where delimiters are found.

So, one cannot for example, put an initiator and terminator on the prefix 
length type so as to have syntax separating it from the actual content. 
Even if it is fixed length you can't do it - Like you cannot model this 
data as 3 string elements using prefix length:

(11)9 Ocean Way(20)Southwest(SW) Harbor(02)ME

(Notice in the above the unescaped "(SW)", which is why this is not a 
delimited format.)

You also cannot do:

11(9 Ocean Way)20(Southwest(SW) Harbor)02(ME)

because that puts the initiator of the string element itself after its 
prefix length region, which is backwards from the way we have it in the 
grammar currently. Both of the examples above require use of a separate 
element and lengthKind="explicit" to pull off, even though they seem like 
fairly natural ways to textualize a binary format.

Now consider

xx9 Ocean WayxxSouthwest(SW) HarborxxME

where the "xx" is a 16 bit (2 byte) binary integer holding the lengths 11, 
20, and 2 respectively.

Except....That is, so long as the "xx" doesn't need to be on a 2-byte 
alignment, because in my example the first element occupies 13 bytes 
including the prefix itself, so the next "xx" starts on an odd boundary.  
I could specify alignment on each of the 3 elements of my sequence here, 
which is unmotivated/weird since they're string elements and their type 
may be distant from where the elements are declared, so the motivation for 
the alighment may not be clear....... the alignment constraint really 
wants to be expressed on the prefixLengthType, and the dfdl annotation 
syntax lets you specify alignment there, ... it just doesn't use it. 

If we just redefine PrefixLength as SimpleElement, now all the example 
formats above are easily modeled in the obvious way, and even the 
combinations of text and binary lengths can be done naturally, as a binary 
prefixLengthType integer type can have all the usual constraints binary 
data likes to have, like alignment.

Even the 2-level ASN.1 wierd case "prefix-length of the prefix-length" 
(see errata 2.13) works because ElementLeftFraming itself includes 
PrefixLength. I believe we should put an explicit depth limit of 2 on this 
however. 

(Side note: I'd like to see an example of the ASN.1 format that supposedly 
requires this nested prefix of the prefix situation.)

Changing the grammar in this way lets us drop the special case handling 
around prefixLength where it can't have lengthKind="delimited" and ignores 
initiators and terminators and alignment which is a bunch less special 
cases to have to implement and test, and create special warnings for 
(e.g., "Warning: prefixLengthType 'lenType' has alignment property which 
will be ignored.")

If we want to be more minimal about the changes, just changing

PrefixLength = ElementLeftFraming SimpleContent RightFraming

is sufficient and achieves the fix of the real problem. 

(This also eliminates the need for current errata 2.13 and 2.14, or rather 
replaces those errata with this new stuff.)

...mikeb

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg


From:
Tim Kimber/UK/IBM at IBMGB
To:
Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:
dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date:
23/10/2011 21:12
Subject:
Re: [DFDL-WG] lengthKind='prefixed' clarification needed
Sent by:
dfdl-wg-bounces at ogf.org



Hi Mike, 

I have always assumed that it works like this: 
The Prefix region includes leading alignment, leading skip and initiator 
The Content region contains the data, and the lengthKind property 
describes how to determine the content length 
The Suffix region includes Terminator and trailing alignment. 
The lengthKind property describes the content region, and is not examined 
until the Content region is reached. So the element's iniitator, if 
defined, is not included in the length described by the prefix length. 

If you view the prefixed length as describing the length of the *element* 
(i..e its entire representation ) then this definition is not intuitive. 
But I have always viewed lengthKind='prefixed' as being like the other 
lengthKinds - it describes the length of the element's *content*. 
So it's a consistent definition, but is it useful? I think so. In my 
experience, prefixed lengths tend to be applied to complex elements ( 
structures ) rather than simple values. In such cases, the content of the 
complex element will always be either a sequence group or a choice group, 
and any initiator/terminator can be located on that group.. 

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742




From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        dfdl-wg at ogf.org 
Date:        22/10/2011 19:12 
Subject:        [DFDL-WG] lengthKind='prefixed' clarification needed 
Sent by:        dfdl-wg-bounces at ogf.org 



For agenda/issues list

With respect to lengthKind='prefixed'. I'm concerned that there's a 
complex interaction with initiator/terminator.

Can we have a prefix length and an initiator and terminator as well? If so 
which comes first, and if it's the prefix does the prefix length include 
the length of the initiator and terminator? 

The grammar as written in current draft of spec has the initiator first, 
then the prefix, then the content, and then the terminator. I think this 
is wrong. I mean we can make it work, but it's not a useful, nor intuitive 
behavior. 

If we're going to fix this, I think we should make prefixed an alternative 
to initiator and terminator, so that you can't have both on the element. 

The alternative is to change the order around. Because initiator and 
terminator can each be lists of alternative choices, the only sensible 
composition of prefixed with these has prefix length providing the length 
of a syntax which includes static initiator and terminator fields, which 
are sort of like static padding to be trimmed off the string before 
extracting the value.

E.g., prefix length of 10 preceeding these characters: [[123456]]

<element name='x' type='int' dfdl:initiator="[[" dfdl:terminator="]]" 
dfdl:lengthKind='prefixed' .../>

But,....this is obscure enough that I'd rather make prefix length 
exclusive of initiator/terminator. I.e. Schema Def Error if both are 
specified. 

Rationale: Even if such formats are possible, and even if they do exist 
somewhere, it's possible to model this format differently with hidden 
fields, lengthKind='explicit' etc., so it's not like removing this complex 
interaction of prefix with initiator/terminator reduces DFDL's expressive 
power in any way. 

Summary: To reduce complexity, suggest that lengthKind='prefixed' is 
exclusive of both initiator and terminator properties directly on the same 
element. Schema Definition Error if both are specified.


-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg 





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg








Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20111026/8ee61eb1/attachment-0001.html 


More information about the dfdl-wg mailing list