[DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits

Stephen Lawrence slawrence at tresys.com
Fri Nov 30 13:10:23 EST 2018


As an example of why I feel bitOrder and byteOrder apply if supporting
hexBinary with non-byte size lengths or starting on non-byte boundaries,
let's say we we had the following data:

  11011111 11010001 = 0xDFD1

And we want to model this as one 12-bit unsigned int followed by one
4-bit unsigned int, all with bitOrder=LSBF and byteOrder=LE. We would
have a schema like so:

  <dfdl:format
    lengthKind="explicit"
    lengthUnits="bits"
    bitOrder="leastSignifigantBitFirst"
    byteOrder="littleEndian" />

  <xs:sequence>
    <xs:element name="foo" dfdl:length="12" type="xs:unsignedInt" />
    <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" />
  </xs:sequence>

The above data would parse as:

  <foo>479</foo> <!-- binary: 000111011111, hex 0x1DF -->
  <bar>13</bar> <!-- binary: 1101, hex 0xD -->

This is because due to the bit/byteOrder, "foo" is made up of the last
four bits in second byte (0001) followed by the first eight bits of the
first byte (11011111), resulting in a value of 479. The bitPosition
after "foo" is consumed is 12. Then "bar" consumes the remaining bits,
which are the first four of the second byte, resulting in a value of 13.

This all follows the specification as-is.


Now, let's assume we instead wanted to represent "foo" as xs:hexBinary
that has a non-byte size length, e.g.:

  <xs:sequence>
    <xs:element name="foo" dfdl:length="12" type="xs:hexBinary" />
    <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" />
  </xs:sequence>

If we ignored bitOrder/bytOrder when parsing "foo" read the first 12
bits (essentially BE MSBF), the result would be:

  <foo>0DFD</foo>

But just like before, the bitPosition after "foo" is consumed is 12. And
because the bit/byteOrder is LSBF LE, the bits that "bar" will consume
are again the first four of the second byte, with the result

  <bar>13</bar>

But this means that the last four bits in the data (0001) were never
consumed, and the first four bits in the second byte were consumed
twice, which must be wrong (a similar issue occurs when starting on a
non-byte boundary). So bitOrder/byteOrder must be taken into account
somehow in order to support hexBinary with non-bytesize lengths or
starting on a non-byte boundary, primarily because of how bitOrder=LSBF
works (which I believe was the original use-case for non-byte size
non-byte boundary hexBinary).

If instead we do not ignore bit/byteOrder, there must be some way to
determine how to get those bits into a hexBinary representation. There
are probably a few different ways to handle this, but after some
discussions and interpretations of the XSD spec, we determined that the
best way to handle it was to just read the bits as if they were a
nonNegativeInteger (which does take into account bit/byteOrder) and then
convert those bits to a hex representation. For BE MSBF the result is
exactly the same. For LE MBSF, it results in the hexBinary being
flipped, which is where the Daffodil implementation is inconsistent with
spec.




On 11/29/18 10:19 AM, Steve Hanson wrote:
> Mike
> 
> I'm a bit lost on this now.  The concept of applying lengthUnits='bits' to 
> xs:hexBinary is straightforward. It just counts bits. Bit order or byte order is 
> irrelevant, in the same way that it is irrelevant when counting bytes for a hex 
> binary. The only thing to note is that the fillByte needs to be used to make up 
> whole bytes.
> 
> I'm missing something here.
> 
> Regards
> 
> Steve Hanson
> 
> IBM Hybrid Integration, Hursley, UK
> Architect, _IBM DFDL_ <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, _OGF DFDL Working Group_ <http://www.ogf.org/dfdl/>_
> __smh at uk.ibm.com_ <mailto:smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
> 
> 
> 
> From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To: DFDL-WG <dfdl-wg at ogf.org>
> Date: 20/11/2018 17:33
> Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with       
>   lengthUnits bits
> Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> 
> --------------------------------------------------------------------------------
> 
> 
> 
> Users want a way to express an arbitrary unaligned string of bits, with the 
> appearance in the infoset being hexadecimal, not base 10.
> 
> Right now the only way I can see to meet this requirement while retaining 
> backward compatibility would be a new DFDL property.
> 
> So here's the new idea:
> 
> Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so 
> defaulting (with suppressible warning) to 'bytes' for backward compatibility in 
> schemas not having the property.
> 
> When set to 'bits', then type xs:hexBinary would behave just like 
> xs:nonNegativeInteger, and all properties relevant to that type would be 
> applicable, and any use of XSD length facets on such elements would be an SDE.  
> The hexBinary string would be exactly same as if you took the numeric value for 
> a nonNegativeInteger and instead of presenting it as base 10 digits, you use 
> base 16 digits.
> 
> 
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
> _www.tresys.com_ <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are subject 
> to the _OGF Intellectual Property Policy_ 
> <http://www.ogf.org/About/abt_policies.php>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 
> 
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
> 



More information about the dfdl-wg mailing list