[DFDL-WG] Fw: Action 233 (deferred) - "byte order not sufficient..." - draft document on experience with binary format MIL-STD-2045

Steve Hanson smh at uk.ibm.com
Fri Jul 11 09:11:54 EDT 2014


Mike

Some further thoughts from IBM on your recommendations, after more 
internal discussion here.

Preferable to have dfdl:bitOrder as a separate property rather to handle 
it via new dfdl:byteOrder enums. Although new properties pose validation 
issues for existing schemas, this should not compromise the language 
design. DFDL can choose what bitOrder/byteOrder combinations are 
supported.

OK with with new dfdl:byteOrder enum for littleEndianAtomic16Bit though 
can we improve the name?

dfdl:encoding has an architected system for extra encodings so 
US-ASCII-7-Bit-Packed should be x-US-ASCII-7-Bit-Packed, and the spec 
updated to remove specific mention of US-ASCII-7-Bit-Packed.

We discussed proposed new dfdl:lengthKind 'fixedLengthOrTerminated'.  A 
new enum implies that it can be used in any scenario, so the following 
need to be specified.

dfdl:terminator must be set and can not be empty string or contain ES on 
its own

If xs:string or xs:hexBinary, can maxLength facet be used instead of 
dfdl:length? (Suggest no - this is variable length data so min/maxLength 
are for validation only).

Can dfdl:length be an expression? (Suggest no unless specific use case 
identified)

Any special rules for emptyValueDelimiterPolicy and 
nilValueDelimiterPolicy ?

Use on complex element. Presumably dfdl:length is first used to extract a 
'box' but within that box does parser immediately scan for the 
dfdl:terminator or does it descend into the complex type and parse the 
children, expecting to either consume all the box or to find the 
terminator at the end? (Suggest the latter).

Use on complex element. Last child can not be dfdl:lengthKind 
'endOfParent'.

Scanning rules: Use of this new dfdl:lengthKind switches off any in-scope 
stack of terminating markup in force at that point. Put another way, when 
we are scanning for the dfdl:terminator, we are not looking for any markup 
from an outer scope.

So there's plenty to think about with this new dfdl:lengthKind. A good 
rule for deciding whether a new dfdl:length or dfdl:occursCountKind should 
be added is whether it bends some other part of the spec out of shape. The 
new dfdl:lengthKind looks ok so far. 

However we *think* we have come up with an alternative model which is 
simpler than you one you state in the document. Example for field 'varstr' 
with max length 100:

<xs:sequence dfdl:terminator="{if (fn:str-len(varstr) eq 100) then '%ES;' 
else '%DEL'}" ...>
        <xs:element name="varstr" type="xs:string" 
dfdl:lengthKind="pattern" dfdl:pattern="([^\x7F].\x7F)|(.{100})" ... />
</xs:sequence>

Can't put dfdl:terminator with a self-referencing expression on the 
element. Might need fn:exists in the dfdl:terminator expression to handle 
optionality. Does that work?

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 11/07/2014 13:09 -----

From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, 
Cc:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   08/07/2014 13:31
Subject:        Re: [DFDL-WG] Action 233 (deferred) - "byte order not 
sufficient..." - draft document on experience with binary format 
MIL-STD-2045


Mike

Please find attached IBM's initial comments to your experience document, 
as Word comments.  We only got as far as the 3 x required extensions, not 
looked at the optional usability stuff in detail yet.

We think we have our collective heads around the least significant bit 
ordering concept, but we think the explanation could be clearer and show 
the bits on-the-wire. Some debate as to whether this could be considered 
some variation of byteOrder but you've obviously thought this through and 
concluded a separate property is best. Also should bit order apply to text 
reps, given that byteOrder is binary rep only and any byte ordering 
variations in encodings are handled as separate encodings (eg, UTF-16LE 
and UTF-16BE).

Regarding the US-ASCII-7-Bit-Packed encoding enum, this was added via 
erratum previously using the idea of DFDL-specific named encoding. But we 
are thinking that this could have been handled as an x- encoding, rather 
than specifically adding it to the spec.  And thinking further on that 
same thread, should byteOrder be made to work like encoding and allow x- 
enums, then the new byteOrder would become a x- enum.  The Wikipedia 
article you cite on Endianness mentions other byte orders (eg, 
Middle-Endian, PDP-Endian).



Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   24/06/2014 20:27
Subject:        [DFDL-WG] Action 233 (deferred) - "byte order not 
sufficient..." - draft document on experience with binary format 
MIL-STD-2045
Sent by:        dfdl-wg-bounces at ogf.org



I have created an experience document about the "bit order" issue, which 
was a deferred action 233, and the subject of a public comment.

The document is here: http://redmine.ogf.org/dmsf_files/13268. The public 
comment item is http://redmine.ogf.org/boards/15/topics/43.

It recommends a new dfdl:bitOrder property, and a new dfdl:byteOrder enum 
value, without which it is impossible to model these data formats. It also 
recommends  several other improvements to DFDL to facilitate handling 
these data formats. 

The formats in question are a variety of MIL-STD formats which are all 
densely packed binary data. These formats are in broad use. MIL-STD-2045 
is one part of this family and this particular format specification is 
generally available without any restrictions from a US DoD web site (
http://assistdocs.com) so I made this specific format the subject of the 
document as it illustrates all the problematic issues.

We have implemented the dfdl:bitOrder property in Daffodil, and it works 
with some useful tests now passing. 

We have also enhanced our TDML implementation to enable creation of tests 
for this feature (and in the process actually found two bugs in the 
MIL-STD-2045 spec!). 

Both the property and this TDML enhancement are described in the document.

The sponsors of the Daffodil project are extremely keen to get this needed 
binary support into the DFDL v1.0 standard so as to have multiple DFDL 
implementations support it. 

...mikeb

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140711/45b53af4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: draft-gwdi-mil-std-2045-v1-IBM.docx
Type: application/octet-stream
Size: 206196 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140711/45b53af4/attachment-0001.obj>


More information about the dfdl-wg mailing list