[DFDL-WG] thread on bit order and byte order in mil-std-2045 and related formats

Fri Jul 25 12:23:22 EDT 2014

Do the TDL standards also include 'positional' bit indicators for optional 
elements, 'repeat' indicators for repeating elements, and character 
strings which are variable + <DEL> or fixed?

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   25/07/2014 16:16
Subject:        [DFDL-WG] thread on bit order and byte order in 
mil-std-2045 and        related formats
Sent by:        dfdl-wg-bounces at ogf.org

I've included material here from a separate email thread discussing the 
bit order and byte order requirements in order to handle mil-std-2045 and 
related tactical data link (TDL) standards (one is known as link16). These 
TDL standards are not publicly available though limited information about 
them (their names, and purposes) is available on Wikipedia.

Much of the discussion thread revolves around link16 and the additional 
requirements it has above and beyond what the (publicly-available) 
mil-std-2045 headers require.  

What Russ Hawkins (link16 SME) describes in the thread below as:

"first byte-level bit order reversal over the entire (element's data), 
then individual parsing of each field and performing a bit reversal on the 
post-parse field value; or the equivalent functionality of these two 
cascaded bit order operations (byte and field level bit order)."

This is in fact equivalent to what 
dfdl:bitOrder='leastSignificantBitFirst' means, and is how it is 
implemented inside Daffodil. This transformation is required for all the 
fields in MIL-STD-2045, as well as the code-points of the 7-bit characters 
and 6-bit characters. 

The primary conclusions from this thread:

(a) no new byteOrder enum is needed - the byte/word swapping isn't a 
byte-order thing, it's a pre-processing pass.
(b) 6-bit chars are needed.

Like all email threads this is in reverse chronological order, so you may 
need to peruse from the bottom. 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
 ________________________________________
From: Mike Beckerle
Sent: Thursday, July 24, 2014 5:22 PM
To: Hawkins, Russ C
Subject: RE: Bit-order - Please review sections 1 to 5 ?

Thanks Russ,

I did have the word-swap notion wrong. However, as a preprocessing pass 
this doesn't need to be represented in a DFDL property quite yet.

And yes, if you have values that have to  be mathematically combined with 
small-ish calculations that is supported in both parsing and unparsing 
directions.

...mikeb
________________________________________
From: Hawkins, Russ C [russ.c.hawkins at lmco.com]
Sent: Thursday, July 24, 2014 5:10 PM
To: Mike Beckerle
Subject: RE: Bit-order - Please review sections 1 to 5 ?

Mike,
It is only a preprocessing step optionally performed over the entire body 
content.  The reordering of bytes from byte order 2143 to byte order 1234 
is only a preprocessing pre-parse operation (or post-processing after 
unparse).   Based on wording in your email below, I wanted to be sure that 
this occasional need does NOT swap 16-bit word-pairs (byte order 3412 to 
byte order 1234) but each pair of bytes are changing places with each 
other (byte order 2143 to byte order 1234), since the below sounded like 
the former.

I took a quick look through the ... spec and came up with several fields 
well over 32 bits.   None appear to require integer type.  Some spares are 
much greater than 32 bits, off-byte bounds, and must be zeroed for 
unparse.  I assume that is possible.  I don't think any fields are true 
integers and I found no float/double reals.  Many fields exist that exceed 
24 bits and many others must be concatenated ... from different places in 
the binary to exceed 24 bits.  I assume the latter can be accommodated 
with expressions that calculate the final value by multiplying and adding.

Russ

-----Original Message-----
From: Mike Beckerle [mailto:mbeckerle at tresys.com]
Sent: Wednesday, July 23, 2014 8:33 PM
To: Hawkins, Russ C
Cc: Riechel, Jay P; Christensen, Craig M
Subject: EXTERNAL: RE: Bit-order - Please review sections 1 to 5 ?

One more question.

Assuming I have correctly pre-processed the msg payloads to swap the 
16-bit words, then I am now ready to extract fields one by one.

Are some of the fields now going to be 32-bit integers in this mixed 
endian byte order (I was calling it littleEndian16Bit) where the first 16 
bits are the least-significant 16 bits, and the second 16 bits are the 
most significant 16 bits ?

Or is the reordering that involves swapping the 16-bit words really ONLY 
needed by the pre-processing of the message payload?

Thanks again

...mike

________________________________________
From: Hawkins, Russ C [russ.c.hawkins at lmco.com]
Sent: Wednesday, July 23, 2014 12:58 PM
To: Mike Beckerle
Cc: Riechel, Jay P; Christensen, Craig M
Subject: RE: Bit-order - Please review sections 1 to 5 ?

Mike,
I think your email correctly characterizes the situation - as long as in 
Step 2 'using the required least significant-bit first bit order" includes 
both: first byte-level bit order reversal over the entire body, then 
individual parsing of each field and performing a bit reversal on the 
post-parse field value; or the equivalent functionality of these two 
cascaded bit order operations (byte and field level bit order).

Since you are considering performing this using a multi-pass prototype, I 
I will say how our parser performs all these cascaded operations we've 
been talking about, but does so with a single parse or unparse pass:

(Here Russ describes a multi-pass scheme that they have implemented in 
some data parsing system.
This is interesting enough in its own right that I've excerpted it into a 
separate dfdl-wg email item.)

Instead of using multiple passes, ... 

-----Original Message-----
From: Mike Beckerle [mailto:mbeckerle at tresys.com]
Sent: Wednesday, July 23, 2014 8:41 AM
To: Hawkins, Russ C
Subject: EXTERNAL: RE: Bit-order - Please review sections 1 to 5 ?

So I would characterize this in this way:

Step 1: A source-site specific pre-processing pass that re-orders 16-bit 
words. Only done for some sites.

Step 2: Parsing the fields of the message using the required 
least-significant-bit first bit order & little-endian byte order.

If I have this right, then this requires a small application, not just the 
DFDL parser.  The application would call the DFDL parser to parse whatever 
header data, and get a byte array for the payload data that must be 
(sometimes) pre-processed. The application would then decide whether to 
pre-process, i.e., swap the 16-bit words of the byte array. The 
application would then call the DFDL parser again to parse the actual 
message fields.

This is typical of processing when data is encoded and must be decoded 
before being parsed.  E.g., we have the exact same issue with email 
messages, where the message may have sections encoded via the base64 
scheme. This must be decoded, and then the data parsed.  (Another format) 
has the same issue in that mil-std-2045 header can say the body is 
gzipped. Ungzipping it isn't something we do (today) in the DFDL parser.

Some day we hope to add this sort of multi-pass capability into DFDL - 
probably DFDL v2.0 however. In the mean time this sort of thing requires 
that the multiple passes are managed by a small application.  We plan to 
use the Daffodil implementation to experiment with features that allow 
this sort of multi-pass capability.

________________________________________
From: Hawkins, Russ C [russ.c.hawkins at lmco.com]
Sent: Tuesday, July 22, 2014 7:18 PM
To: Mike Beckerle
Subject: RE: Bit-order - Please review sections 1 to 5 ?

Mike,
All msg types exhibit the same behavior.   For a given site, if any msg 
types require the littleEndian16Bit byte-pair swapping, they all do.   It 
is not unique to any one msg type.   This particular byte-pair swap is 
unique to the site and its supplying feed source.

Then after the above site-dependent byte-pair swap, every site does 
require the byte-level bitOrder reversal though for our parser to parse 
(extract) the fields left-to-right - this provides fields that cross byte 
boundaries with a contiguous bit order that has all bits, including 
Character fields, in the correct relative position, although 
bit-reversed.   Then because of that field level reversal that we 
introduced, every field, including Characters (including 6-bit Characters) 
must have its bits reversed to get the intended value.

I have looked over our ...  msg type parse tables and verified that it 
exhibits this behavior in its entire body, including its 6-bit Character 
fields.

Russ Hawkins

-----Original Message-----
From: Mike Beckerle [mailto:mbeckerle at tresys.com]
Sent: Tuesday, July 22, 2014 4:26 PM
To: Hawkins, Russ C; Stephen Lawrence; Taylor Wise
Cc: Christensen, Craig M; Riechel, Jay P
Subject: EXTERNAL: RE: Bit-order - Please review sections 1 to 5 ?

Thanks Russ,

I looked over your comments.

I've got the 6-bit charset encoding. That we can add easily.

I understand from your comments that for some message types the whole 
message payload (but not header) has word and byte swapping that must be 
done as a pre-processing pass before the payload can be parsed into its 
multiple fields.

Is there an example specific message type that has this characteristic? If 
I correctly recall prior email threads, I think you said there is no clear 
indicator when this pre-processing is required, but I wanted to make sure 
I'm understanding this correctly.

...mikeb

____________________________
From: Hawkins, Russ C [russ.c.hawkins at lmco.com]
Sent: Tuesday, July 22, 2014 1:12 PM
To: Stephen Lawrence; Mike Beckerle; Taylor Wise
Cc: Hawkins, Russ C; Christensen, Craig M; Riechel, Jay P
Subject: RE: Bit-order - Please review sections 1 to 5 ?

Mike,
Some preliminary comments embedded with initials RH.

Obviously a lot of thought and work went into this, however some things 
were confusing to me.  Noted need for 6-bit Packed ASCII encoding  and 
possible need for bitOrder and 16bit byte order reversals for 6-bit ASCII 
and other non-numeric fields.

Russ Hawkins

-----Original Message-----
From: Stephen Lawrence [mailto:slawrence at tresys.com]
Sent: Thursday, July 17, 2014 9:24 AM
To: Mike Beckerle; Hawkins, Russ C; Taylor Wise
Subject: EXTERNAL: RE: Bit-order - Please review sections 1 to 5 ?

Made a couple of comments (attached). Some things in section 5 were a 
little unclear to me.

- Steve
________________________________________
From: Mike Beckerle
Sent: Wednesday, July 16, 2014 4:25 PM
To: 'russ.c.hawkins at lmco.com'; Stephen Lawrence; Taylor Wise
Subject: Bit-order - Please review sections 1 to 5 ?

Russ, Steve, Taylor,

Attached is an internal DFDL Workgroup working document about bit order. 
It is my personal internal draft as I'm the author.

I believe I now understand why bit order has been so hard to convey to 
others in the DFDL workgroup.

It is because there are two different formats both of which can be 
considered to be "LSB first". This document teases them apart.

All review comments gratefully accepted. If it is easiest, you can edit 
the document directly and put in MS Word Comment bubbles. Then I can merge 
them. Or however you prefer really.

...mike beckerle--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140725/52937e7e/attachment-0001.html>