[DFDL-WG] 7-bit ascii packed together

Steve Hanson smh at uk.ibm.com
Mon Oct 22 06:32:03 EDT 2012


Hi Mike

I thought this would come up at some point, and my assumption has always 
been that we would handle it using special enums of dfdl:encoding, and for 
fixed length use lengthUnits 'characters'. That means we can continue with 
our existing rules for when you use lengthUnits 'bits' and not have to 
extend them to xs:string. We would disallow lengthUnits 'bytes'.

I would suggest that a DFDL parser takes the 7-bits and pads to 8-bits 
before calling ICU, and the reverse after calling ICU when unparsing. That 
way we don't need to get ICU to handle this. (I'm assuming they don't, Tim 
is going to find out).

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   19/10/2012 20:44
Subject:        [DFDL-WG] 7-bit ascii packed together
Sent by:        dfdl-wg-bounces at ogf.org




I have a data format in front of me that has 64 7-bit ASCII characters, 
but the format has them bit-packed, i.e., 448 = 7 * 64 bits, so ....the 
character codes aren't octet/byte aligned.

Furthermore, the 'string' either uses up the entire 64 character maximum 
length OR it has a terminating character which is a 0x7F character code.

I believe I was the advocate for a position that character codes should 
always be 8-bit aligned. That would be because I had never seen anything 
like this.

I am told there are also 6-bit ascii-variations, similarly packed together 
to save space.

BTW: This occurs in a specific US MIL STD message header format, so it's 
not like it's some obscure unused corner case.

Right now, the best I think I can do is to model this data not as a string 
at all, but as an array of integers, each one having 7-bit length, and not 
aligned (that is, aligned to 1-bit). Doing that I can use 
occursCountKind='parsed', and an assertion to deal with the optional 
termination by 0x7F value. 

To handle this as a string, we'd need to be able to specify that the 
character codes are not aligned, and the width of the bit-fields making up 
each character code. Or I suppose we could just say this is a special kind 
of character set encoding "ASCII-7-bit-packed" or something. 

Having that, I could deal with the termination via a choice of either the 
terminated flavor, or the fixed length flavor (which excludes the 
terminator) by way of a choice of two strings each having a 
lengthKind="pattern".

Comments?





-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121022/118cb077/attachment.html>


More information about the dfdl-wg mailing list