[DFDL-WG] lengthKind='prefixed' clarification needed

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Oct 27 07:36:45 CDT 2011


I would never have thought to align the string's type, not the integer!  I
could (I did) spend hours without realizing that.

This works in that the alignment region is after initiator, so that can't
foul the alignment.

So, then I think we just need the stipulations that bring the constraints on
the prefix length type in the text to match what the grammar will support.
Which I think is just expanding the existing list of restrictions in the
text like this:

*Errata N Section 12.3.4 dfdl:lengthKind 'prefixed'*
*
Replace phrase: "It is a schema definition error if the xs:simpleType
specifies dfdl:lengthKind 'delimited' or 'endOfParent' or a
dfdl:outputValueCalc"* *with "It is a schema definition error if the
xs:simpleType specifies dfdl:lengthKind 'delimited', or dfdl:lengthKind
'endOfParent', or specifies a value for any of these properties
dfdl:outputValueCalc, dfdl:initiator, dfdl:terminator, dfdl:alignment,
dfdl:leadingSkip, or dfdl:trailingSkip.*"

I think we use the term "specifies" to mean directly specifies, i.e., on
local dfdl annotation syntax, or gets it via composition, or ref or scope.
That is what is intended here. Also, turning off a property, as in
dfdl:initiator="" ought to be allowed for those properties which can be
turned off in this way.

Do we need a terminology point on this issue, to shorten descriptions and
make them uniform in other places? Perhaps we already have this.


On Thu, Oct 27, 2011 at 6:29 AM, Steve Hanson <smh at uk.ibm.com> wrote:

>
> Hi Mike
>
> I agree that forcing someone to put dfdl:alignment on the element when they
> really want it on the simple type is not good - but the spec does not imply
> that today.  Unless I am missing something I can write:
>
> <simpleType name="string" dfdl:lengthKind="prefixed"
> dfdl:prefixLengthType="stdStrPrefixType" dfdl:alignment="2"
> dfdl:alignmentUnits="bytes" >
>
>    <restriction base="xs:string" >
>         <maxLength value='65535'/>
>     </restriction>
> </simpleType>
>
> <simpleType name="stdStrPrefixType" dfdl:representation="binary">
>
>    <restriction base="xs:unsignedShort"/>
> </simpleType>
>
> The properties of simple type "string" and the element are combined.  This
> alignment is applied *before* the prefix length is consumed, as per the
> grammar.  This seems fine to me. Taking a (hypothetical) example, if I was
> modelling a PL/1 var char then I know that all var chars are aligned on
> half-word boundaries, so I put the alignment on the type for the var char.
>
> Have I misunderstood the problem?
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>  From: Mike Beckerle <mbeckerle.dfdl at gmail.com> To: Steve
> Hanson/UK/IBM at IBMGB Cc: dfdl-wg at ogf.org, Tim Kimber/UK/IBM at IBMGB Date: 26/10/2011
> 16:02 Subject: Re: [DFDL-WG] lengthKind='prefixed' clarification needed
> ------------------------------
>
>
>
> Oops, a nit: I missed dfdl:outputValueCalc which should also be in the list
> of things that cause a schema def error.
>
> On Wed, Oct 26, 2011 at 10:55 AM, Mike Beckerle <*mbeckerle.dfdl at gmail.com
> * <mbeckerle.dfdl at gmail.com>> wrote:
> Ok, I have a smaller simpler proposal to fix this which is really a very
> small spec change  The total wording of this would be these additional
> errata:
>
> *Errata N Section 12.3.4 dfdl:lengthKind 'prefixed'*
> *
> Replace phrase: "It is a schema definition error if the xs:simpleType
> specifies dfdl:lengthKind 'delimited' or 'endOfParent' or a
> dfdl:outputValueCalc"* *with "It is a schema definition error if the
> xs:simpleType specifies dfdl:lengthKind 'delimited', dfdl:lengthKind
> 'endOfParent', dfdl:initiator, dfdl:terminator, dfdl:leadingSkip, or
> dfdl:trailingSkip.*
> *
> Errata M Section 9.1 DFDL Data Syntax Grammar*
> *
> Replace "PrefixLength = SimpleContent"* *with "PrefixLength =
> LeadingAlignment SimpleContent"*
>
> Today's errata on nested prefix length types would remain as errata.
>
> The first new errata is important clarification, and the schema def error
> is the conservative thing to do for the future. We have followed this
> principle many other places in the spec.
>
> The second new errata fixes a below-described issue with alignment and
> information hiding/composition properties. It is, in some sense a very small
> change to the spec to fix a rather glaring composition problem. Here's the
> rationale:
>
> To me lengthKind='prefixed' ought to handle the variable-length string case
> like this:
>
> In the schema for the specific application's data:
>
> ...
> <element name="lname" type="fmt:string"/>  // note use of string, but from
> different namespace
> <element name="addr1" type="fmt:string"/>
> <element name="addr2" type="fmt:string"/>
> <element name="postal" type="fmt:string"/>
> ...
>
>
> Then, over in a different schema file.... which defines the namespace bound
> to fmt above as its target namespace....
>
> <simpleType name="string" dfdl:lengthKind="prefixed"
> dfdl:prefixLengthType="stdStrPrefixType">
>    <restriction base="xs:string" >
>         <maxLength value='65535'/>
>     </restriction>
> </simpleType>
>
> <simpleType name="stdStrPrefixType" dfdl:alignment="2"
> dfdl:alignmentUnits="bytes" dfdl:representation="binary">
>    <restriction base="xs:unsignedShort"/>
> </simpleType>
>
> The highlight here is that the alignment restriction is most naturally
> expressed on the simpleType, the details of which are not near the string
> itself's element declaration. The spec specifically allows this placement on
> the simpleType for the dfdl annotations right now. It's only the grammar
> that implies it would be ignored, as would initiator, terminator, leading
> and trailing skip all be ignored.
>
> To me the above pattern/idiom, where the format annotation cruft is
> isolated on type definitions, is generally to be encouraged. THere's a
> composition property I'm trying to preserve here, which is that you can put
> two things side by side without having to worry about whether you are
> meeting one or the other's alignment requirements. The alignment goes in the
> package with the definition of the thing.
>
> I don't think you should have to write:
>
> ...
> <element name="lname" type="fmt:string" dfdl:alignment='2' />
> <element name="addr1" type="fmt:string" dfdl:alignment='2'/>
> <element name="addr2" type="fmt:string" dfdl:alignment='2'/>
> <element name="postal" type="fmt:string" dfdl:alignment='2'/>
> ...
>
> To me that violates a valuable principle of information hiding. I can hide
> the type, but not hide its alignment requirements? Anywhere you can abstract
> and hide a SimpleContent item that can be binary numeric where alignment is
> a common requirement, you need also to be able to hide its alignment
> requirements along side it so that using it won't be in the error prone
> situation where the user can misalign it. PrefixLength is the one place
> (That i know of) where we're violating this principle. It's an easy
> omission, and an easy fix.
>
>
> ...mikeb
>
>
>
> On Wed, Oct 26, 2011 at 6:15 AM, Steve Hanson <*smh at uk.ibm.com*<smh at uk.ibm.com>>
> wrote:
>
> Mike
>
> Length kind 'prefixed' was intended to handle the case where the length is
> tightly bound to the data, ie, there is nothing between the length and the
> data. For example a PL/1 var char or ASN.1 BER.  If the length causes the
> length/data to be aligned then that has to be taken into account on the
> element itself.   Length kind 'prefixed' was not intended to cover more
> complex cases where the length itself has independent alignment or there are
> delimiters involved. For those you use length kind 'explicit' and an
> expression. Otherwise the combinations become too complicated. If we wish to
> extend 'prefixed' to include the more complex cases, I think that is a post
> 1.0 thought and is best handled using a different length kind enum.
>
> You say that ignoring the alignment property on the simple type used for
> the length is strange, but if you allow that there is no way to align the
> element's actual data separately. I think that it is even stranger.
>
> The ASN.1 BER description at *
> http://en.wikipedia.org/wiki/Basic_Encoding_Rules*<http://en.wikipedia.org/wiki/Basic_Encoding_Rules>describes how the length itself can have a prefix (see sub-section
> 'Length').
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:*+44-1962-815848* <%2B44-1962-815848>
>
>   From: Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>
> >  To: Tim Kimber/UK/IBM at IBMGB, *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>  Date: 24/10/2011
> 02:16  Subject: Re: [DFDL-WG] lengthKind='prefixed' clarification needed  Sent
> by: *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
>
>  ------------------------------
>
>
> Definintely need an agenda slot to discuss this matter.
>
> I think we should redefine PrefixLength to allow it to have framing.
>
> There is a significant issue which is that some prefixLengthTypes will be
> multi-byte binary integers (typically 2 or 4 bytes), and these commonly
> require alignment to a 2 or 4 byte boundary, as that's how the data
> structures they live in would have been laid out.
>
> The spec currently doesn't allow prefixLengthTypes to be aligned
> themselves, because the grammar has them as SimpleContent, without the
> surrounding ElementLeftFraming and RightFraming. This is also why they
> cannot have lengthKind='delimited'. Because there are no initiator nor
> terminator regions surrounding them. So the only way they can be aligned is
> if the elements that have these prefixLengths are themselves aligned
> properly.
>
> However, if you specify alignment on a simple integer type, use it as a
> prefixLengthType, and then that alignment annotation is *ignored* that would
> seem strange, and buggy/hard-to-diagnose.
>
> However, scoping rules for properties don't provide any way for this
> alignment to get "into the scope chain", and I'd hate to start messing with
> the scoping rules because of the corner case of prefixLength. We'd need to
> put another scoping rule in just to handle this. I'd rather not go there.
> Lots of our examples in the spec would have to change as they use alignment
> as the example property...
>
> But, the spec is not self-consistent, as the dfdl:alignment property can be
> placed on a simple type definition, as well as on an element. So it would
> seem a prefixLengthType could reference an aligned simple integer type, but
> neither the grammar nor the scoping rules allow for using this alignment
> property to control anything.  Similarly, you can put an initiator on a
> simple integer type, use it as a prefixLengthType, and have the initiator be
> ignored.... because there is no initiator region for a PrefixLength.
>
> We need to fix this inconsistency.
>
> I think prefixLengthType needs to be alignable, and one should be able to
> specify alignment on a  type definition, not just on an element.
>
> I also think we're better off with a uniform general fix here, than a
> handful of special case rules around prefix lengths. (E.g., the
> prefixLengthType cannot have alignment, cannot have initiator/terminator or
> lengthKind delimited warning or SDE if it does, etc. etc.)
>
> So I think the grammar is wrong. I think
>
> PrefixLength = SimpleElement
>
> (where SimpleElement = ElementLeftFraming SimpleContent RightFraming )
>
> is the right definition.
>
>
> In working through examples, I'm convinced the current spec is problematic.
> In the current spec one must model a 4-byte aligned binary integer prefix
> length as a separate element (so that you can align it), and use
> lengthKind='explicit' on the thing it controls. This is a lot of hassle for
> a very common situation. The whole point of dfdl:lengthKind='prefixed' is to
> provide an easier way to model the common cases.
>
> For the same reason there is no alignment, the definition of
> dfdl:prefixLengthType says the named type cannot have
> lengthKind='delimited'. That is because the DFDL grammar defines the
> prefixLength region to be SimpleContent which is without any of the
> surrounding framing regions where delimiters are found.
>
> So, one cannot for example, put an initiator and terminator on the prefix
> length type so as to have syntax separating it from the actual content. Even
> if it is fixed length you can't do it - Like you cannot model this data as 3
> string elements using prefix length:
>
> (11)9 Ocean Way(20)Southwest(SW) Harbor(02)ME
>
> (Notice in the above the unescaped "(SW)", which is why this is not a
> delimited format.)
>
> You also cannot do:
>
> 11(9 Ocean Way)20(Southwest(SW) Harbor)02(ME)
>
> because that puts the initiator of the string element itself after its
> prefix length region, which is backwards from the way we have it in the
> grammar currently. Both of the examples above require use of a separate
> element and lengthKind="explicit" to pull off, even though they seem like
> fairly natural ways to textualize a binary format.
>
> Now consider
>
> xx9 Ocean WayxxSouthwest(SW) HarborxxME
>
> where the "xx" is a 16 bit (2 byte) binary integer holding the lengths 11,
> 20, and 2 respectively.
>
> Except....That is, so long as the "xx" doesn't need to be on a 2-byte
> alignment, because in my example the first element occupies 13 bytes
> including the prefix itself, so the next "xx" starts on an odd boundary.  I
> could specify alignment on each of the 3 elements of my sequence here, which
> is unmotivated/weird since they're string elements and their type may be
> distant from where the elements are declared, so the motivation for the
> alighment may not be clear....... the alignment constraint really wants to
> be expressed on the prefixLengthType, and the dfdl annotation syntax lets
> you specify alignment there, ... it just doesn't use it.
>
> If we just redefine PrefixLength as SimpleElement, now all the example
> formats above are easily modeled in the obvious way, and even the
> combinations of text and binary lengths can be done naturally, as a binary
> prefixLengthType integer type can have all the usual constraints binary data
> likes to have, like alignment.
>
> Even the 2-level ASN.1 wierd case "prefix-length of the prefix-length" (see
> errata 2.13) works because ElementLeftFraming itself includes PrefixLength.
> I believe we should put an explicit depth limit of 2 on this however.
>
> (Side note: I'd like to see an example of the ASN.1 format that supposedly
> requires this nested prefix of the prefix situation.)
>
> Changing the grammar in this way lets us drop the special case handling
> around prefixLength where it can't have lengthKind="delimited" and ignores
> initiators and terminators and alignment which is a bunch less special cases
> to have to implement and test, and create special warnings for (e.g.,
> "Warning: prefixLengthType 'lenType' has alignment property which will be
> ignored.")
>
> If we want to be more minimal about the changes, just changing
>
> PrefixLength = ElementLeftFraming SimpleContent RightFraming
>
> is sufficient and achieves the fix of the real problem.
>
> (This also eliminates the need for current errata 2.13 and 2.14, or rather
> replaces those errata with this new stuff.)
>
> ...mikeb
>
> --
>  dfdl-wg mailing list
>  *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>  *http://www.ogf.org/mailman/listinfo/dfdl-wg*<http://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>   From: Tim Kimber/UK/IBM at IBMGB  To: Mike Beckerle <*
> mbeckerle.dfdl at gmail.com* <mbeckerle.dfdl at gmail.com>>  Cc: *
> dfdl-wg at ogf.org* <dfdl-wg at ogf.org>, *dfdl-wg-bounces at ogf.org*<dfdl-wg-bounces at ogf.org>
> Date: 23/10/2011 21:12  Subject: Re: [DFDL-WG] lengthKind='prefixed'
> clarification needed  Sent by: *dfdl-wg-bounces at ogf.org*<dfdl-wg-bounces at ogf.org>
>
>  ------------------------------
>
>
>
> Hi Mike,
>
> I have always assumed that it works like this:
> The Prefix region includes leading alignment, leading skip and initiator
> The Content region contains the data, and the lengthKind property describes
> how to determine the content length
> The Suffix region includes Terminator and trailing alignment.
> The lengthKind property describes the content region, and is not examined
> until the Content region is reached. So the element's iniitator, if defined,
> is not included in the length described by the prefix length.
>
> If you view the prefixed length as describing the length of the *element*
> (i..e its entire representation ) then this definition is not intuitive. But
> I have always viewed lengthKind='prefixed' as being like the other
> lengthKinds - it describes the length of the element's *content*.
> So it's a consistent definition, but is it useful? I think so. In my
> experience, prefixed lengths tend to be applied to complex elements (
> structures ) rather than simple values. In such cases, the content of the
> complex element will always be either a sequence group or a choice group,
> and any initiator/terminator can be located on that group..
>
> regards,
>
> Tim Kimber, Common Transformation Team,
> Hursley, UK
> Internet:  *kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 246742
>
>
>
>
> From:        Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>
> >
> To:        *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
> Date:        22/10/2011 19:12
> Subject:        [DFDL-WG] lengthKind='prefixed' clarification needed
> Sent by:        *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
>  ------------------------------
>
>
>
> For agenda/issues list
>
> With respect to lengthKind='prefixed'. I'm concerned that there's a complex
> interaction with initiator/terminator.
>
> Can we have a prefix length and an initiator and terminator as well? If so
> which comes first, and if it's the prefix does the prefix length include the
> length of the initiator and terminator?
>
> The grammar as written in current draft of spec has the initiator first,
> then the prefix, then the content, and then the terminator. I think this is
> wrong. I mean we can make it work, but it's not a useful, nor intuitive
> behavior.
>
> If we're going to fix this, I think we should make prefixed an alternative
> to initiator and terminator, so that you can't have both on the element.
>
> The alternative is to change the order around. Because initiator and
> terminator can each be lists of alternative choices, the only sensible
> composition of prefixed with these has prefix length providing the length of
> a syntax which includes static initiator and terminator fields, which are
> sort of like static padding to be trimmed off the string before extracting
> the value.
>
> E.g., prefix length of 10 preceeding these characters: [[123456]]
>
> <element name='x' type='int' dfdl:initiator="[[" dfdl:terminator="]]"
> dfdl:lengthKind='prefixed' .../>
>
> But,....this is obscure enough that I'd rather make prefix length exclusive
> of initiator/terminator. I.e. Schema Def Error if both are specified.
>
> Rationale: Even if such formats are possible, and even if they do exist
> somewhere, it's possible to model this format differently with hidden
> fields, lengthKind='explicit' etc., so it's not like removing this complex
> interaction of prefix with initiator/terminator reduces DFDL's expressive
> power in any way.
>
> Summary: To reduce complexity, suggest that lengthKind='prefixed' is
> exclusive of both initiator and terminator properties directly on the same
> element. Schema Definition Error if both are specified.
>
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  *781-330-0412* <781-330-0412>
> --
> dfdl-wg mailing list*
> **dfdl-wg at ogf.org* <dfdl-wg at ogf.org>*
> **http://www.ogf.org/mailman/listinfo/dfdl-wg*<http://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
>
>  ------------------------------
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
> --
>  dfdl-wg mailing list
>  *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>  *http://www.ogf.org/mailman/listinfo/dfdl-wg*<http://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
>
>
>  ------------------------------
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  *781-330-0412* <781-330-0412>
>
>
>
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  781-330-0412
>
>
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>


-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20111027/abef7b18/attachment-0001.html 


More information about the dfdl-wg mailing list