[DFDL-WG] Fw: Action 233 (deferred) - "byte order not sufficient..." - draft document on experience with binary format MIL-STD-2045

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri Jul 11 13:24:22 EDT 2014


Thanks for this additional input.


> Some further thoughts from IBM on your recommendations, after more
> internal discussion here.
>
>    - Preferable to have dfdl:bitOrder as a separate property rather to
>    handle it via new dfdl:byteOrder enums. Although new properties pose
>    validation issues for existing schemas, this should not compromise the
>    language design. DFDL can choose what bitOrder/byteOrder combinations are
>    supported.
>    -
>    - OK with with new dfdl:byteOrder enum for littleEndianAtomic16Bit
>    though can we improve the name?
>
> I am absolutely open to suggestions on the name. I adapted this name from
the wikipedia article terminology.

>
>    -
>    -
>    - dfdl:encoding has an architected system for extra encodings so
>    US-ASCII-7-Bit-Packed should be x-US-ASCII-7-Bit-Packed, and the spec
>    updated to remove specific mention of US-ASCII-7-Bit-Packed.
>
> Thoughts: if there is no support for this 7-bit packed ascii flavor, then
there is no point in having dfdl:bitOrder support. The two go together.

So in the section on optional DFDL features would we say this is the
optional feature:

dfdl:bitOrder="leastSignificantBitFirst" and
dfdl:encoding="x-dfdl-us-ascii-7-bit-packed"

Or is there no mention of the encoding?

I raise this because the two really go together. There is no point in
having one without the other, and there needs to be an agreed-upon standard
meaning for x-dfdl-us-ascii-7-bit-packed encoding.

So this x-dfdl-us-ascii-7-bit-packed is a DFDL standard, not an
implementation-defined standard.

We discussed proposed new dfdl:lengthKind 'fixedLengthOrTerminated'.  A new
> enum implies that it can be used in any scenario, so the following need to
> be specified.
>
>    - dfdl:terminator must be set and can not be empty string or contain
>    ES on its own
>    -
>    - If xs:string or xs:hexBinary, can maxLength facet be used instead of
>    dfdl:length? (Suggest no - this is variable length data so min/maxLength
>    are for validation only).
>    -
>    - Can dfdl:length be an expression? (Suggest no unless specific use
>    case identified)
>
> My use case needs only constants as the maximum, hence enum name contains
"fixed" prefix, not "explicit".

>
>    -
>    -
>    - Any special rules for emptyValueDelimiterPolicy and
>    nilValueDelimiterPolicy ?
>
> Since a terminator must be set, then these cannot be "none" or
"initiator".

>
>    -
>    -
>    - Use on complex element. Presumably dfdl:length is first used to
>    extract a 'box' but within that box does parser immediately scan for the
>    dfdl:terminator or does it descend into the complex type and parse the
>    children, expecting to either consume all the box or to find the terminator
>    at the end? (Suggest the latter).
>
> I have no use case that requires this for complex types at all.
Perhaps we can dodge this by having it be simpleFixedLengthOrTerminated,
and restricting it to simple types only. ?


>
>    - Use on complex element. Last child can not be dfdl:lengthKind
>    'endOfParent'.
>    -
>    - Scanning rules: Use of this new dfdl:lengthKind switches off any
>    in-scope stack of terminating markup in force at that point. Put another
>    way, when we are scanning for the dfdl:terminator, we are not looking for
>    any markup from an outer scope.
>    -
>
> So there's plenty to think about with this new dfdl:lengthKind. A good
> rule for deciding whether a new dfdl:length or dfdl:occursCountKind should
> be added is whether it bends some other part of the spec out of shape. The
> new dfdl:lengthKind looks ok so far.
>
> However we *think* we have come up with an alternative model which is
> simpler than you one you state in the document. Example for field 'varstr'
> with max length 100:
>
> <xs:sequence dfdl:terminator="{if (fn:str-len(varstr) eq 100) then '%ES;'
> else '%DEL'}" ...>
>         <xs:element name="varstr" type="xs:string"
> dfdl:lengthKind="pattern" dfdl:pattern="([^\x7F].\x7F)|(.{100})" ... />
> </xs:sequence>
>
> Can't put dfdl:terminator with a self-referencing expression on the
> element. Might need fn:exists in the dfdl:terminator expression to handle
> optionality. Does that work?
>
> I don't think this will work as %ES isn't allowed in terminators.

There is a proposal to allow it, but only when length kind is such that one
is not scanning for delimiters (same restriction as for WSP*). Let's assume
that we allow %ES for now.

One beauty of your idea here is that unparsing will "just work", so that's
nice.

But I think your pattern has a bug: I think it should be
dfdl:pattern="[^\x7F]{0,99}(?=\x7F)| .{100}"
This will not capture more than 99 characters prior to the DEL, and will
not include the DEL as part of the string in the case where a DEL is found
(uses lookahead in regex). Hence, the DEL will be available to be picked
off as the terminator. Without this you end up with the DEL in the payload.

With that I think your approach would work. So thanks for that idea.

Perhaps there is an even simpler way to model this, which will work today
puts the conditional logic as a choice.

<choice>
       <!-- length kind pattern is needed to bound length to max of 99 -->
       <element name="raw1" type="xs:string"
           dfdl:lengthKind='pattern'
           dfdl:lengthPattern="[^\x7F]{0,99}"
           dfdl:terminator="%DEL;"/>
       <element name="raw2" type="xs:string"
            dfdl:lengthKind="explicit"
            dfdl:length="100"/>
</choice>
<element name='value' type='xs:string'
     dfdl:inputValueCalc='{ if (fn:exists( ../raw1 ) then ../raw1 else
../raw2 }'/>

We still have to play the hidden group game though to hide raw1 and raw2.

I have to think hard about how to handle a choice like this on unparsing
though. I'm uncertain about how a dfdl:outputValueCalc on raw1 would
conditionally fail, so that raw2 would be the selected output
representation. We can't use an assertion as those aren't evaluated for
unparsing.



> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> ----- Forwarded by Steve Hanson/UK/IBM on 11/07/2014 13:09 -----
>
> From:        Steve Hanson/UK/IBM
> To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>,
> Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        08/07/2014 13:31
> Subject:        Re: [DFDL-WG] Action 233 (deferred) - "byte order not
> sufficient..." - draft document on experience with binary format
> MIL-STD-2045
> ------------------------------
>
>
> Mike
>
> Please find attached IBM's initial comments to your experience document,
> as Word comments.  We only got as far as the 3 x required extensions, not
> looked at the optional usability stuff in detail yet.
>
> We think we have our collective heads around the least significant bit
> ordering concept, but we think the explanation could be clearer and show
> the bits on-the-wire. Some debate as to whether this could be considered
> some variation of byteOrder but you've obviously thought this through and
> concluded a separate property is best. Also should bit order apply to text
> reps, given that byteOrder is binary rep only and any byte ordering
> variations in encodings are handled as separate encodings (eg, UTF-16LE and
> UTF-16BE).
>
> Regarding the US-ASCII-7-Bit-Packed encoding enum, this was added via
> erratum previously using the idea of DFDL-specific named encoding. But we
> are thinking that this could have been handled as an x- encoding, rather
> than specifically adding it to the spec.  And thinking further on that same
> thread, should byteOrder be made to work like encoding and allow x- enums,
> then the new byteOrder would become a x- enum.  The Wikipedia article you
> cite on Endianness mentions other byte orders (eg, Middle-Endian,
> PDP-Endian).
>
>
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
> Date:        24/06/2014 20:27
> Subject:        [DFDL-WG] Action 233 (deferred) - "byte order not
> sufficient..." - draft document on experience with binary format
> MIL-STD-2045
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> I have created an experience document about the "bit order" issue, which
> was a deferred action 233, and the subject of a public comment.
>
> The document is here: *http://redmine.ogf.org/dmsf_files/13268*
> <http://redmine.ogf.org/dmsf_files/13268>. The public comment item is
> *http://redmine.ogf.org/boards/15/topics/43*
> <http://redmine.ogf.org/boards/15/topics/43>.
>
> It recommends a new dfdl:bitOrder property, and a new dfdl:byteOrder enum
> value, without which it is impossible to model these data formats. It also
> recommends  several other improvements to DFDL to facilitate handling these
> data formats.
>
> The formats in question are a variety of MIL-STD formats which are all
> densely packed binary data. These formats are in broad use. MIL-STD-2045 is
> one part of this family and this particular format specification is
> generally available without any restrictions from a US DoD web site (
> *http://assistdocs.com* <http://assistdocs.com/>) so I made this specific
> format the subject of the document as it illustrates all the problematic
> issues.
>
> We have implemented the dfdl:bitOrder property in Daffodil, and it works
> with some useful tests now passing.
>
> We have also enhanced our TDML implementation to enable creation of tests
> for this feature (and in the process actually found two bugs in the
> MIL-STD-2045 spec!).
>
> Both the property and this TDML enhancement are described in the document.
>
> The sponsors of the Daffodil project are extremely keen to get this needed
> binary support into the DFDL v1.0 standard so as to have multiple DFDL
> implementations support it.
>
> ...mikeb
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140711/1c0003b6/attachment-0001.html>


More information about the dfdl-wg mailing list