[DFDL-WG] lengthKind='prefixed' clarification needed

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Oct 26 10:02:00 CDT 2011


Oops, a nit: I missed dfdl:outputValueCalc which should also be in the list
of things that cause a schema def error.

On Wed, Oct 26, 2011 at 10:55 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com>wrote:

> Ok, I have a smaller simpler proposal to fix this which is really a very
> small spec change  The total wording of this would be these additional
> errata:
>
> *Errata N Section 12.3.4 dfdl:lengthKind 'prefixed'*
>
> *Replace phrase: "It is a schema definition error if the xs:simpleType
> specifies dfdl:lengthKind 'delimited' or 'endOfParent' or a
> dfdl:outputValueCalc"* *with "It is a schema definition error if the
> xs:simpleType specifies dfdl:lengthKind 'delimited', dfdl:lengthKind
> 'endOfParent', dfdl:initiator, dfdl:terminator, dfdl:leadingSkip, or
> dfdl:trailingSkip.*
> * *
> * Errata M Section 9.1 DFDL Data Syntax Grammar*
> * *
> * Replace "PrefixLength = SimpleContent"* *with "PrefixLength =
> LeadingAlignment SimpleContent"*
>  * *
> Today's errata on nested prefix length types would remain as errata.
>
> The first new errata is important clarification, and the schema def error
> is the conservative thing to do for the future. We have followed this
> principle many other places in the spec.
>
> The second new errata fixes a below-described issue with alignment and
> information hiding/composition properties. It is, in some sense a very small
> change to the spec to fix a rather glaring composition problem. Here's the
> rationale:
>
> To me lengthKind='prefixed' ought to handle the variable-length string case
> like this:
>
> In the schema for the specific application's data:
>
> ...
> <element name="lname" type="fmt:string"/>  // note use of string, but from
> different namespace
> <element name="addr1" type="fmt:string"/>
> <element name="addr2" type="fmt:string"/>
> <element name="postal" type="fmt:string"/>
> ...
>
>
> Then, over in a different schema file.... which defines the namespace bound
> to fmt above as its target namespace....
>
> <simpleType name="string" dfdl:lengthKind="prefixed"
> dfdl:prefixLengthType="stdStrPrefixType">
>    <restriction base="xs:string" >
>         <maxLength value='65535'/>
>     </restriction>
> </simpleType>
>
> <simpleType name="stdStrPrefixType" dfdl:alignment="2"
> dfdl:alignmentUnits="bytes" dfdl:representation="binary">
>    <restriction base="xs:unsignedShort"/>
> </simpleType>
>
> The highlight here is that the alignment restriction is most naturally
> expressed on the simpleType, the details of which are not near the string
> itself's element declaration. The spec specifically allows this placement on
> the simpleType for the dfdl annotations right now. It's only the grammar
> that implies it would be ignored, as would initiator, terminator, leading
> and trailing skip all be ignored.
>
> To me the above pattern/idiom, where the format annotation cruft is
> isolated on type definitions, is generally to be encouraged. THere's a
> composition property I'm trying to preserve here, which is that you can put
> two things side by side without having to worry about whether you are
> meeting one or the other's alignment requirements. The alignment goes in the
> package with the definition of the thing.
>
> I don't think you should have to write:
>
> ...
> <element name="lname" type="fmt:string" dfdl:alignment='2' />
> <element name="addr1" type="fmt:string" dfdl:alignment='2'/>
> <element name="addr2" type="fmt:string" dfdl:alignment='2'/>
> <element name="postal" type="fmt:string" dfdl:alignment='2'/>
> ...
>
> To me that violates a valuable principle of information hiding. I can hide
> the type, but not hide its alignment requirements? Anywhere you can abstract
> and hide a SimpleContent item that can be binary numeric where alignment is
> a common requirement, you need also to be able to hide its alignment
> requirements along side it so that using it won't be in the error prone
> situation where the user can misalign it. PrefixLength is the one place
> (That i know of) where we're violating this principle. It's an easy
> omission, and an easy fix.
>
>
> ...mikeb
>
>
>
> On Wed, Oct 26, 2011 at 6:15 AM, Steve Hanson <smh at uk.ibm.com> wrote:
>
>>
>> Mike
>>
>> Length kind 'prefixed' was intended to handle the case where the length is
>> tightly bound to the data, ie, there is nothing between the length and the
>> data. For example a PL/1 var char or ASN.1 BER.  If the length causes the
>> length/data to be aligned then that has to be taken into account on the
>> element itself.   Length kind 'prefixed' was not intended to cover more
>> complex cases where the length itself has independent alignment or there are
>> delimiters involved. For those you use length kind 'explicit' and an
>> expression. Otherwise the combinations become too complicated. If we wish to
>> extend 'prefixed' to include the more complex cases, I think that is a post
>> 1.0 thought and is best handled using a different length kind enum.
>>
>> You say that ignoring the alignment property on the simple type used for
>> the length is strange, but if you allow that there is no way to align the
>> element's actual data separately. I think that it is even stranger.
>>
>> The ASN.1 BER description at
>> http://en.wikipedia.org/wiki/Basic_Encoding_Rules describes how the
>> length itself can have a prefix (see sub-section 'Length').
>>
>> Regards
>>
>> Steve Hanson
>> Architect, Data Format Description Language (DFDL)
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> IBM SWG, Hursley, UK*
>> **smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>>
>>
>>  From: Mike Beckerle <mbeckerle.dfdl at gmail.com> To: Tim
>> Kimber/UK/IBM at IBMGB, dfdl-wg at ogf.org Date: 24/10/2011 02:16 Subject: Re:
>> [DFDL-WG] lengthKind='prefixed' clarification needed Sent by:
>> dfdl-wg-bounces at ogf.org
>> ------------------------------
>>
>>
>> Definintely need an agenda slot to discuss this matter.
>>
>> I think we should redefine PrefixLength to allow it to have framing.
>>
>> There is a significant issue which is that some prefixLengthTypes will be
>> multi-byte binary integers (typically 2 or 4 bytes), and these commonly
>> require alignment to a 2 or 4 byte boundary, as that's how the data
>> structures they live in would have been laid out.
>>
>> The spec currently doesn't allow prefixLengthTypes to be aligned
>> themselves, because the grammar has them as SimpleContent, without the
>> surrounding ElementLeftFraming and RightFraming. This is also why they
>> cannot have lengthKind='delimited'. Because there are no initiator nor
>> terminator regions surrounding them. So the only way they can be aligned is
>> if the elements that have these prefixLengths are themselves aligned
>> properly.
>>
>> However, if you specify alignment on a simple integer type, use it as a
>> prefixLengthType, and then that alignment annotation is *ignored* that would
>> seem strange, and buggy/hard-to-diagnose.
>>
>> However, scoping rules for properties don't provide any way for this
>> alignment to get "into the scope chain", and I'd hate to start messing with
>> the scoping rules because of the corner case of prefixLength. We'd need to
>> put another scoping rule in just to handle this. I'd rather not go there.
>> Lots of our examples in the spec would have to change as they use alignment
>> as the example property...
>>
>> But, the spec is not self-consistent, as the dfdl:alignment property can
>> be placed on a simple type definition, as well as on an element. So it would
>> seem a prefixLengthType could reference an aligned simple integer type, but
>> neither the grammar nor the scoping rules allow for using this alignment
>> property to control anything.  Similarly, you can put an initiator on a
>> simple integer type, use it as a prefixLengthType, and have the initiator be
>> ignored.... because there is no initiator region for a PrefixLength.
>>
>> We need to fix this inconsistency.
>>
>> I think prefixLengthType needs to be alignable, and one should be able to
>> specify alignment on a  type definition, not just on an element.
>>
>> I also think we're better off with a uniform general fix here, than a
>> handful of special case rules around prefix lengths. (E.g., the
>> prefixLengthType cannot have alignment, cannot have initiator/terminator or
>> lengthKind delimited warning or SDE if it does, etc. etc.)
>>
>> So I think the grammar is wrong. I think
>>
>> PrefixLength = SimpleElement
>>
>> (where SimpleElement = ElementLeftFraming SimpleContent RightFraming )
>>
>> is the right definition.
>>
>>
>> In working through examples, I'm convinced the current spec is
>> problematic. In the current spec one must model a 4-byte aligned binary
>> integer prefix length as a separate element (so that you can align it), and
>> use lengthKind='explicit' on the thing it controls. This is a lot of hassle
>> for a very common situation. The whole point of dfdl:lengthKind='prefixed'
>> is to provide an easier way to model the common cases.
>>
>> For the same reason there is no alignment, the definition of
>> dfdl:prefixLengthType says the named type cannot have
>> lengthKind='delimited'. That is because the DFDL grammar defines the
>> prefixLength region to be SimpleContent which is without any of the
>> surrounding framing regions where delimiters are found.
>>
>> So, one cannot for example, put an initiator and terminator on the prefix
>> length type so as to have syntax separating it from the actual content. Even
>> if it is fixed length you can't do it - Like you cannot model this data as 3
>> string elements using prefix length:
>>
>> (11)9 Ocean Way(20)Southwest(SW) Harbor(02)ME
>>
>> (Notice in the above the unescaped "(SW)", which is why this is not a
>> delimited format.)
>>
>> You also cannot do:
>>
>> 11(9 Ocean Way)20(Southwest(SW) Harbor)02(ME)
>>
>> because that puts the initiator of the string element itself after its
>> prefix length region, which is backwards from the way we have it in the
>> grammar currently. Both of the examples above require use of a separate
>> element and lengthKind="explicit" to pull off, even though they seem like
>> fairly natural ways to textualize a binary format.
>>
>> Now consider
>>
>> xx9 Ocean WayxxSouthwest(SW) HarborxxME
>>
>> where the "xx" is a 16 bit (2 byte) binary integer holding the lengths 11,
>> 20, and 2 respectively.
>>
>> Except....That is, so long as the "xx" doesn't need to be on a 2-byte
>> alignment, because in my example the first element occupies 13 bytes
>> including the prefix itself, so the next "xx" starts on an odd boundary.  I
>> could specify alignment on each of the 3 elements of my sequence here, which
>> is unmotivated/weird since they're string elements and their type may be
>> distant from where the elements are declared, so the motivation for the
>> alighment may not be clear....... the alignment constraint really wants to
>> be expressed on the prefixLengthType, and the dfdl annotation syntax lets
>> you specify alignment there, ... it just doesn't use it.
>>
>> If we just redefine PrefixLength as SimpleElement, now all the example
>> formats above are easily modeled in the obvious way, and even the
>> combinations of text and binary lengths can be done naturally, as a binary
>> prefixLengthType integer type can have all the usual constraints binary data
>> likes to have, like alignment.
>>
>> Even the 2-level ASN.1 wierd case "prefix-length of the prefix-length"
>> (see errata 2.13) works because ElementLeftFraming itself includes
>> PrefixLength. I believe we should put an explicit depth limit of 2 on this
>> however.
>>
>> (Side note: I'd like to see an example of the ASN.1 format that supposedly
>> requires this nested prefix of the prefix situation.)
>>
>> Changing the grammar in this way lets us drop the special case handling
>> around prefixLength where it can't have lengthKind="delimited" and ignores
>> initiators and terminators and alignment which is a bunch less special cases
>> to have to implement and test, and create special warnings for (e.g.,
>> "Warning: prefixLengthType 'lenType' has alignment property which will be
>> ignored.")
>>
>> If we want to be more minimal about the changes, just changing
>>
>> PrefixLength = ElementLeftFraming SimpleContent RightFraming
>>
>> is sufficient and achieves the fix of the real problem.
>>
>> (This also eliminates the need for current errata 2.13 and 2.14, or rather
>> replaces those errata with this new stuff.)
>>
>> ...mikeb
>>
>> --
>>  dfdl-wg mailing list
>>  dfdl-wg at ogf.org
>>  http://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>>
>>  From: Tim Kimber/UK/IBM at IBMGB To: Mike Beckerle <
>> mbeckerle.dfdl at gmail.com> Cc: dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
>> Date: 23/10/2011 21:12 Subject: Re: [DFDL-WG] lengthKind='prefixed'
>> clarification needed Sent by: dfdl-wg-bounces at ogf.org
>>
>> ------------------------------
>>
>>
>>
>> Hi Mike,
>>
>> I have always assumed that it works like this:
>> The Prefix region includes leading alignment, leading skip and initiator
>> The Content region contains the data, and the lengthKind property
>> describes how to determine the content length
>> The Suffix region includes Terminator and trailing alignment.
>> The lengthKind property describes the content region, and is not examined
>> until the Content region is reached. So the element's iniitator, if defined,
>> is not included in the length described by the prefix length.
>>
>> If you view the prefixed length as describing the length of the *element*
>> (i..e its entire representation ) then this definition is not intuitive. But
>> I have always viewed lengthKind='prefixed' as being like the other
>> lengthKinds - it describes the length of the element's *content*.
>> So it's a consistent definition, but is it useful? I think so. In my
>> experience, prefixed lengths tend to be applied to complex elements (
>> structures ) rather than simple values. In such cases, the content of the
>> complex element will always be either a sequence group or a choice group,
>> and any initiator/terminator can be located on that group..
>>
>> regards,
>>
>> Tim Kimber, Common Transformation Team,
>> Hursley, UK
>> Internet:  kimbert at uk.ibm.com
>> Tel. 01962-816742
>> Internal tel. 246742
>>
>>
>>
>>
>> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
>> To:        dfdl-wg at ogf.org
>> Date:        22/10/2011 19:12
>> Subject:        [DFDL-WG] lengthKind='prefixed' clarification needed
>> Sent by:        dfdl-wg-bounces at ogf.org
>>  ------------------------------
>>
>>
>>
>> For agenda/issues list
>>
>> With respect to lengthKind='prefixed'. I'm concerned that there's a
>> complex interaction with initiator/terminator.
>>
>> Can we have a prefix length and an initiator and terminator as well? If so
>> which comes first, and if it's the prefix does the prefix length include the
>> length of the initiator and terminator?
>>
>> The grammar as written in current draft of spec has the initiator first,
>> then the prefix, then the content, and then the terminator. I think this is
>> wrong. I mean we can make it work, but it's not a useful, nor intuitive
>> behavior.
>>
>> If we're going to fix this, I think we should make prefixed an alternative
>> to initiator and terminator, so that you can't have both on the element.
>>
>> The alternative is to change the order around. Because initiator and
>> terminator can each be lists of alternative choices, the only sensible
>> composition of prefixed with these has prefix length providing the length of
>> a syntax which includes static initiator and terminator fields, which are
>> sort of like static padding to be trimmed off the string before extracting
>> the value.
>>
>> E.g., prefix length of 10 preceeding these characters: [[123456]]
>>
>> <element name='x' type='int' dfdl:initiator="[[" dfdl:terminator="]]"
>> dfdl:lengthKind='prefixed' .../>
>>
>> But,....this is obscure enough that I'd rather make prefix length
>> exclusive of initiator/terminator. I.e. Schema Def Error if both are
>> specified.
>>
>> Rationale: Even if such formats are possible, and even if they do exist
>> somewhere, it's possible to model this format differently with hidden
>> fields, lengthKind='explicit' etc., so it's not like removing this complex
>> interaction of prefix with initiator/terminator reduces DFDL's expressive
>> power in any way.
>>
>> Summary: To reduce complexity, suggest that lengthKind='prefixed' is
>> exclusive of both initiator and terminator properties directly on the same
>> element. Schema Definition Error if both are specified.
>>
>>
>> --
>> Mike Beckerle | OGF DFDL WG Co-Chair
>> Tel:  781-330-0412
>> --
>> dfdl-wg mailing list
>> dfdl-wg at ogf.org
>> *http://www.ogf.org/mailman/listinfo/dfdl-wg*<http://www.ogf.org/mailman/listinfo/dfdl-wg>
>>
>>
>>
>>  ------------------------------
>> *
>> *
>>
>> *Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>> *
>>
>>
>>
>>
>>
>> --
>>  dfdl-wg mailing list
>>  dfdl-wg at ogf.org
>>  http://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>>
>>
>>
>>
>>  ------------------------------
>>
>> *
>> *
>>
>> *Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>> *
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  781-330-0412
>
>


-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20111026/fe1679e2/attachment-0001.html 


More information about the dfdl-wg mailing list