[DFDL-WG] Action 292 - Write up of hexBinary with lengthUnits 'bits'

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue May 15 17:10:29 EDT 2018


The proposed change is to allow lengthUnits 'bits' for hexBinary data. This
turned out to be more complex to describe than I originally suspected
because of the need to deal with XSD minLength and maxLength facets, which
are always measured in bytes. Those in conjunction with
dfdl:lengthKind='explicit' and dfdl:lengthUnits='bits' create some minor
complexities.

The below changes match the Daffodil implementation of this proposed
feature.

These sentences in the description of dfdl:lengthUnits in Section 12.3 must
change.

- 'bits' may only be used for xs:boolean, xs:byte, xs:short, xs:int,
xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and
xs:unsignedLong simple types with binary representation.
- 'bytes' must be used for type xs:hexBinary.

The text should read

- 'bits' may only be used for xs:hexBinary and for xs:boolean, xs:byte,
xs:short, xs:int, xs:long, xs:unsignedByte, xs:unsignedShort,
xs:unsignedInt, and xs:unsignedLong simple types with binary representation.

Later in the section, the paragraph:

When unparsing a simple element with binary representation, then for
hexBinary the length is the number of bytes in the infoset value padded to
the XSD minLength facet value using dfdl:fillByte, and for the other types
the length is the minimum number of bytes to represent the value and any
sign.

Must change to:
When unparsing a simple element with binary representation, then for types
other than hexBinary the length is the minimum number of bytes to represent
the value and any sign.
For type hexBinary when the dfdl:lengthUnits is 'bytes' then the length is
the number of bytes in the infoset value padded to the XSD minLength facet
value using dfdl:fillByte.
For type hexBinary when the dfdl:lengthUnits is 'bits':

   - First the data is padded to XSD minLength bytes as if the
   dfdl:lengthUnits was 'bytes'.
   - When dfdl:lengthKind is other than 'explicit', the length is the
   number of bytes times 8.
   - When the dfdl:lengthKind is 'explicit' then the value is further
   padded or truncated to fit the target length, in bits.
   - if the data does not have sufficient bytes to supply the target length
      in bits it is a processing error.
      - if the data is longer than the minimum number of bytes needed to
      supply the target length in bits, it is a processing error.
      - If the explicit length in bits is not a multiple of 8, then the
      final byte is only partially unparsed according to the current
      dfdl:byteOrder.


Section 12.3.7.2.7   Length of Binary Opaque Elements, the first sentence
must be modified from:

"The dfdl:lengthUnits property must be 'bytes'. It is a schema definition
error otherwise."

to

"The dfdl:lengthUnits property must be 'bytes' or 'bits'. It is a schema
definition error otherwise. Note that even when the dfdl:lengthUnits
property is 'bits', the values of the XSD minLength and XSD maxLength
facets are still always interpreted as constraining the length in units of
bytes.

That's the end of the actual proposed language.

Note about alternatives: I considered the alternative to make it a schema
definition error when dfdl:lengthUnits is 'bits' type is 'xs:hexBinary',
and the XSD:minLength or XSD:maxLength facets are defined. I decided to go
with the description above to support the use case where a data item is, in
hex, some number of bytes long for a valid XML infoset, but the
representation is explicitly a partial byte smaller. E.g., in a hexBinary
with 8 bytes, but fewer than 64 bits in the representation. Ex: 63 bits as
the explicit target length, so 1 bit will be unused from the xs:hexBinary
logical value, but the above rules insure no more than 7 bits go unused
from the final byte of the hexBinary logical value.

This is trying to be consistent with the notion that we do not truncate
data to fit into the available length except for xs:string when properties
explicitly allow it.
It is also trying to be consistent with our treatment of binary integers
where a xs:long value may be output into an element having room for any
number of bits, and any extra bits in the logical value are ignored.


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180515/481100d0/attachment-0001.html>


More information about the dfdl-wg mailing list