[DFDL-WG] Action 292 - Write up of hexBinary with lengthUnits 'bits'

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Nov 20 12:29:41 EST 2018


This subject came up in daffodil users discussion today, where we realized
that daffodil's implementation of this concept of hexBinary with
lengthUnits bits is quite problematic because it makes the resulting
hexBinary string dependent on both bitOrder and byteOrder properties. E.g.,
the hex string is reversed if the byte order is 'littleEndian' and bit
order is 'leastSignificantBitFirst'.

This is natural, in that a xs:nonNegativeInteger is dependent on both those
properties.

However, making xs:hexBinary depend on those properties is going to be
backwards incompatible for existing schemas for DFDL implementations that
have not yet implemented this feature.

So I think this proposal as implemented in Daffodil is a non-starter at
this point, and am willing to withdraw it.

Users do want a way to describe something tantamount to a fully unaligned
bit string, and have it show up in the infoset in hexadecimal.

I will send a new proposal.










Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Wed, May 16, 2018 at 11:46 AM Steve Hanson <smh at uk.ibm.com> wrote:

> Mike
>
> Comments in-line.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        dfdl-wg at ogf.org
> Date:        15/05/2018 22:10
> Subject:        [DFDL-WG] Action 292 - Write up of hexBinary with
> lengthUnits 'bits'
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
>
> The proposed change is to allow lengthUnits 'bits' for hexBinary data.
> This turned out to be more complex to describe than I originally suspected
> because of the need to deal with XSD minLength and maxLength facets, which
> are always measured in bytes. Those in conjunction with
> dfdl:lengthKind='explicit' and dfdl:lengthUnits='bits' create some minor
> complexities.
>
>
> The below changes match the Daffodil implementation of this proposed
> feature.
>
> These sentences in the description of dfdl:lengthUnits in Section 12.3
> must change.
>
>    - 'bits' may only be used for xs:boolean, xs:byte, xs:short, xs:int,
>    xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and
>    xs:unsignedLong simple types with binary representation.
>    - 'bytes' must be used for type xs:hexBinary.
>
>
> The text should read
>
>    - 'bits' may only be used for xs:hexBinary and for xs:boolean,
>    xs:byte, xs:short, xs:int, xs:long, xs:unsignedByte, xs:unsignedShort,
>    xs:unsignedInt, and xs:unsignedLong simple types with binary representation.
>
>
>
> Later in the section 12.3.2, the paragraph:
>
> When unparsing a simple element with binary representation, then for
> hexBinary the length is the number of bytes in the infoset value padded to
> the XSD minLength facet value using dfdl:fillByte, and for the other types
> the length is the minimum number of bytes to represent the value and any
> sign.
>
> Must change to:
> When unparsing a simple element with binary representation, then for types
> other than hexBinary the length is the minimum number of bytes to represent
> the value and any sign.
> For type hexBinary when the dfdl:lengthUnits is 'bytes' then the length is
> the number of bytes in the infoset value padded to the XSD minLength facet
> value using dfdl:fillByte.
> For type hexBinary when the dfdl:lengthUnits is 'bits':
>
>    - First the data is padded to XSD minLength bytes as if the
>    dfdl:lengthUnits was 'bytes'.
>    - When dfdl:lengthKind is other than 'explicit', the length in bits is
>    the number of bytes times 8.
>    - When the dfdl:lengthKind is 'explicit' then the value is further
>    padded or truncated to fit the target length, in bits.
>       - if the data does not have sufficient bytes to supply the target
>       length in bits it is a processing error.  <--- no, see 12.3.7.2.7
>       and you just said you padded it
>       - if the data is longer than the minimum number of bytes needed to
>       supply the target length in bits, it is a processing error.
>       - If the explicit length in bits is not a multiple of 8, then the
>       final byte is only partially unparsed according to the current
>       dfdl:byteOrder  <--- you mean bitOrder
>
>
>
> >>SMH: Section 12.3.2 is about dfdl:lengthKind 'delimited', so discussion
> should be limited to delimited only, and detail for other length kinds
> moved to their sections or 12.3.7.2.7 for stuff common to specified
> lengths.
>
> >>SMH: Need to consider the impact of this for lengthKinds implicit,
> explicit, prefixed, as they all use lengthUnits. And for explicit, need to
> cover when dfdl:length is an expression (so variable length on output).
>
> Section 12.3.7.2.7   Length of Binary Opaque Elements, the first sentence
> must be modified from:
>
> "The dfdl:lengthUnits property must be 'bytes'. It is a schema definition
> error otherwise."
>
> to
>
> "The dfdl:lengthUnits property must be 'bytes' or 'bits'. It is a schema
> definition error otherwise. Note that even when the dfdl:lengthUnits
> property is 'bits', the values of the XSD minLength and XSD maxLength
> facets are still always interpreted as constraining the length in units of
> bytes.
>
> >>SMH: Earlier in 12.3.7.2 it says "The dfdl:lengthUnits can be 'bytes'
> or 'bits' unless otherwise stated. It is schema definition error if
> dfdl:lengthUnits is 'characters'. " so the first two sentences can just
> be removed.
>
> That's the end of the actual proposed language.
>
>
> Note about alternatives: I considered the alternative to make it a schema
> definition error when dfdl:lengthUnits is 'bits' type is 'xs:hexBinary',
> and the XSD:minLength or XSD:maxLength facets are defined. I decided to go
> with the description above to support the use case where a data item is, in
> hex, some number of bytes long for a valid XML infoset, but the
> representation is explicitly a partial byte smaller. E.g., in a hexBinary
> with 8 bytes, but fewer than 64 bits in the representation. Ex: 63 bits as
> the explicit target length, so 1 bit will be unused from the xs:hexBinary
> logical value, but the above rules insure no more than 7 bits go unused
> from the final byte of the hexBinary logical value.
>
> This is trying to be consistent with the notion that we do not truncate
> data to fit into the available length except for xs:string when properties
> explicitly allow it.
> It is also trying to be consistent with our treatment of binary integers
> where a xs:long value may be output into an element having room for any
> number of bits, and any extra bits in the logical value are ignored.
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20181120/8170eb24/attachment-0001.html>


More information about the dfdl-wg mailing list