[DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits

Steve Hanson smh at uk.ibm.com
Tue Dec 4 04:10:13 EST 2018


I agree that bitOrder is needed, not byteOrder.  If you want to parse the 
data as an integer, then fine but that is not the case here, you are 
parsing the data as hexBinary. The analogy is with your parsing of text 
strings where the encoding is one where the character size is not a 
multiple of 8 bytes; you use bitOrder but not byteOrder.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Stephen Lawrence <slawrence at tresys.com>
To:     Steve Hanson <smh at uk.ibm.com>, "mbeckerle.dfdl at gmail.com" 
<mbeckerle.dfdl at gmail.com>
Cc:     DFDL-WG <dfdl-wg at ogf.org>
Date:   30/11/2018 18:10
Subject:        Re: [DFDL-WG] Action 292 - version 2 proposal for 
hexBinary with lengthUnits bits



As an example of why I feel bitOrder and byteOrder apply if supporting
hexBinary with non-byte size lengths or starting on non-byte boundaries,
let's say we we had the following data:

  11011111 11010001 = 0xDFD1

And we want to model this as one 12-bit unsigned int followed by one
4-bit unsigned int, all with bitOrder=LSBF and byteOrder=LE. We would
have a schema like so:

  <dfdl:format
    lengthKind="explicit"
    lengthUnits="bits"
    bitOrder="leastSignifigantBitFirst"
    byteOrder="littleEndian" />

  <xs:sequence>
    <xs:element name="foo" dfdl:length="12" type="xs:unsignedInt" />
    <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" />
  </xs:sequence>

The above data would parse as:

  <foo>479</foo> <!-- binary: 000111011111, hex 0x1DF -->
  <bar>13</bar> <!-- binary: 1101, hex 0xD -->

This is because due to the bit/byteOrder, "foo" is made up of the last
four bits in second byte (0001) followed by the first eight bits of the
first byte (11011111), resulting in a value of 479. The bitPosition
after "foo" is consumed is 12. Then "bar" consumes the remaining bits,
which are the first four of the second byte, resulting in a value of 13.

This all follows the specification as-is.


Now, let's assume we instead wanted to represent "foo" as xs:hexBinary
that has a non-byte size length, e.g.:

  <xs:sequence>
    <xs:element name="foo" dfdl:length="12" type="xs:hexBinary" />
    <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" />
  </xs:sequence>

If we ignored bitOrder/bytOrder when parsing "foo" read the first 12
bits (essentially BE MSBF), the result would be:

  <foo>0DFD</foo>

But just like before, the bitPosition after "foo" is consumed is 12. And
because the bit/byteOrder is LSBF LE, the bits that "bar" will consume
are again the first four of the second byte, with the result

  <bar>13</bar>

But this means that the last four bits in the data (0001) were never
consumed, and the first four bits in the second byte were consumed
twice, which must be wrong (a similar issue occurs when starting on a
non-byte boundary). So bitOrder/byteOrder must be taken into account
somehow in order to support hexBinary with non-bytesize lengths or
starting on a non-byte boundary, primarily because of how bitOrder=LSBF
works (which I believe was the original use-case for non-byte size
non-byte boundary hexBinary).

If instead we do not ignore bit/byteOrder, there must be some way to
determine how to get those bits into a hexBinary representation. There
are probably a few different ways to handle this, but after some
discussions and interpretations of the XSD spec, we determined that the
best way to handle it was to just read the bits as if they were a
nonNegativeInteger (which does take into account bit/byteOrder) and then
convert those bits to a hex representation. For BE MSBF the result is
exactly the same. For LE MBSF, it results in the hexBinary being
flipped, which is where the Daffodil implementation is inconsistent with
spec.




On 11/29/18 10:19 AM, Steve Hanson wrote:
> Mike
> 
> I'm a bit lost on this now.  The concept of applying lengthUnits='bits' 
to 
> xs:hexBinary is straightforward. It just counts bits. Bit order or byte 
order is 
> irrelevant, in the same way that it is irrelevant when counting bytes 
for a hex 
> binary. The only thing to note is that the fillByte needs to be used to 
make up 
> whole bytes.
> 
> I'm missing something here.
> 
> Regards
> 
> Steve Hanson
> 
> IBM Hybrid Integration, Hursley, UK
> Architect, _IBM DFDL_ <
http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, _OGF DFDL Working Group_ <
http://www.ogf.org/dfdl/
>_
> __smh at uk.ibm.com_ <mailto:smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
> 
> 
> 
> From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To: DFDL-WG <dfdl-wg at ogf.org>
> Date: 20/11/2018 17:33
> Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with  
>   lengthUnits bits
> Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> 
> 
--------------------------------------------------------------------------------
> 
> 
> 
> Users want a way to express an arbitrary unaligned string of bits, with 
the 
> appearance in the infoset being hexadecimal, not base 10.
> 
> Right now the only way I can see to meet this requirement while 
retaining 
> backward compatibility would be a new DFDL property.
> 
> So here's the new idea:
> 
> Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, 
so 
> defaulting (with suppressible warning) to 'bytes' for backward 
compatibility in 
> schemas not having the property.
> 
> When set to 'bits', then type xs:hexBinary would behave just like 
> xs:nonNegativeInteger, and all properties relevant to that type would be 

> applicable, and any use of XSD length facets on such elements would be 
an SDE. 
> The hexBinary string would be exactly same as if you took the numeric 
value for 
> a nonNegativeInteger and instead of presenting it as base 10 digits, you 
use 
> base 16 digits.
> 
> 
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
> _www.tresys.com_ <
http://www.tresys.com
>
> Please note: Contributions to the DFDL Workgroup's email discussions are 
subject 
> to the _OGF Intellectual Property Policy_ 
> <
http://www.ogf.org/About/abt_policies.php
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
> 
https://www.ogf.org/mailman/listinfo/dfdl-wg

> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
> 
> 
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   
https://www.ogf.org/mailman/listinfo/dfdl-wg

> 




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20181204/f3508681/attachment.html>


More information about the dfdl-wg mailing list