[DFDL-WG] suggest: need hexBinary with lengthUnits 'bits' with length not a multiple of 8.

Fri Jan 27 11:07:25 EST 2017

I would suggest that on unparsing, hexbinary with length in bits should
simply truncate, rather than trim zeros or trim fillbyte matching bits.

Two details about the final partial byte.

Which bits of the final byte are lost during truncation is subject to
dfdl:bitOrder when it is a partial byte.

When parsing, which bits are filled in, in the final byte, using fillByte,
would also be subject to dfdl:bitOrder.

I will think further about the binary calendars. The 33-bit one I recall
seeing was an unsigned number, so might not be compatible with the current
definitions if those stiipulate signed. In any case, this is less important
than the hexBinary bits blob issue.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>

On Thu, Jan 26, 2017 at 6:21 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> If allowing lengthUnits 'bits' for a new logical/physical combination has
> no effect on the infoset then that should be ok.
>
> 'binarySecond' & 'binaryMilliseconds'. These were designed to correspond
> to C data types and are always treated as signed. Allowing 'bits' should be
> ok as long as the same rules for signed 'int' & 'long' respectively are
> used.
>
> 'hexBinary' as you note causes a problem as the XSD type must be a
> multiple of 8 bits. That's why it has the restriction of 'bytes' only
> today. If we allow 'bits', then on parsing DFDL would have to pad either
> using 0 bits or the corresponding bits of dfdl:fillByte, and on unparsing
> DFDL would have to trim off the excess as long as it matched 0 bits or the
> corresponding bits of dfdl:fillByte. Today fillByte is never used for
> trimming.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848 <+44%201962%20815848>
> mob:+44-7717-378890 <+44%207717%20378890>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        26/01/2017 05:00
> Subject:        [DFDL-WG] suggest: need hexBinary with lengthUnits 'bits'
> with length not a multiple of 8.
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> We have users who have binary blobs the size of which is given in bits,
> and these blobs are not a multiple of 8 long.
>
> Today the DFDL spec doesn't allow hexBinary to have lengthUnits 'bits'.
>
> I am wondering if this restriction should be lifted.
>
> XSD constrains hexBinary to always have an even number of Hex digits, so
> we would have to do the same.
>
> So for an example, a 17 bit long hexBinary containing all 1 bits would be
> FFFF80
>
> Erratum 5.15 extends the types that are allowed to have length in bits to
> include packed calendars. So there is precedent for opening this
> restriction up if need arises.
>
> I claim we need to
>
> (a) allow length units bits for all types
> (b) restrict the length to have to be 32-bits or 64-bits only, for types
> xs:float and xs:double when representation 'binary'
> (c) restrict packed decimal to have lengths be a multiple of 4 bits (when
> specified in units 'bits')
>
> All other restrictions should be lifted as those restrictions just cause
> problems in some formats.
>
> For example 12.3.7.2.5 Specifies that binary calendars must be 4 bytes or
> 8 bytes exactly, and cannot be specified in units 'bits'. This is just a
> mistake in DFDL.  I have even seen binary calendars with 33 bits length.
> (seconds since 1-1-1970 representation aka binarySeconds) That additional
> bit extends the end time substantially.
>
> These restrictions were put into DFDL because our experience of many
> bit-granularity formats was limited.
>
> What we've found is that there are plenty of data formats where the notion
> of a "byte" is simply absent. Nothing uses multiples of 8 bits for
> anything, and nothing is measured in those units. It's always measured in
> bits.  Even for things like float and double, which have impliicit lengths
> of 4 and 8 bytes respectively, many specifications will express those as 32
> bits or 64 bits. Having to divide by 8 just makes the DFDL schema awkward.
> Similarly in these formats strings are given length in bits. 448 bits worth
> of 7-bit packed ascii characters is 64 characters, occupying 56 bytes, but
> the spec uses 448.
>
> These changes are all backward-compatible. They make legal property
> settings that previously had no meaning and caused SDEs.
>
> Discussion?
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20170127/60d69ebb/attachment-0001.html>