[DFDL-WG] 7-bit ascii packed together

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri Oct 19 15:44:19 EDT 2012


I have a data format in front of me that has 64 7-bit ASCII characters, but
the format has them bit-packed, i.e., 448 = 7 * 64 bits, so ....the
character codes aren't octet/byte aligned.

Furthermore, the 'string' either uses up the entire 64 character maximum
length OR it has a terminating character which is a 0x7F character code.

I believe I was the advocate for a position that character codes should
always be 8-bit aligned. That would be because I had never seen anything
like this.

I am told there are also 6-bit ascii-variations, similarly packed together
to save space.

BTW: This occurs in a specific US MIL STD message header format, so it's
not like it's some obscure unused corner case.

Right now, the best I think I can do is to model this data not as a string
at all, but as an array of integers, each one having 7-bit length, and not
aligned (that is, aligned to 1-bit). Doing that I can use
occursCountKind='parsed', and an assertion to deal with the optional
termination by 0x7F value.

To handle this as a string, we'd need to be able to specify that the
character codes are not aligned, and the width of the bit-fields making up
each character code. Or I suppose we could just say this is a special kind
of character set encoding "ASCII-7-bit-packed" or something.

Having that, I could deal with the termination via a choice of either the
terminated flavor, or the fixed length flavor (which excludes the
terminator) by way of a choice of two strings each having a
lengthKind="pattern".

Comments?





-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121019/93a58640/attachment.html>


More information about the dfdl-wg mailing list