[DFDL-WG] lengthUnits bits not allowed for strings, binary floats, hexBinary

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Jul 23 18:35:07 EDT 2014


DFDL spec currently says this w.r.t. lengthUnits property:


   - 'bits' may only be used for xs:boolean, xs:byte, xs:short, xs:int,
   xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and
   xs:unsignedLong simple types with binary representation.

This feels like a hold over from when we only had strings made up of 8-bit
byte code-units. Now that we have 7-bit and 6-bit characters, this
restriction seems unnecessary, and is in fact awkward because many specs
specify the lengths of strings in bits (which are used universally for the
length of everything in these formats). This is a real concern. In many
cases DFDL Schemas will be generated from other specifications by programs.
Having to conditionally convert the length as specified into different
units for strings is just one more place to have to test, one more way the
DFDL schema doesn't obviously match the specification from which it was
derived, etc.

Similarly:

   - 'bytes' must be used for type xs:hexBinary.
   - 'bytes' must be used for types xs:float and xs:double with binary
   representation.

These are to prevent the user misunderstanding the limitations of these
types. I.e., that we dont support hexbinary that is not a multiple of
8-bits in size, and float and double that are not exactly 4 and 8 bytes
respectively.

But now this restriction just seems annoying. If my data format
specification base has all these values in bits, then it is painful when
creating a DFDL schema to have to transform the values for just those
element declarations that are of these types.

I'm not suggesting we lift the actual restrictions. I'm good with hexBinary
requiring whole bytes, and that float and double are exactly 32-bits and
64-bits respectively. I just think having to use bytes as the length units
is just arbitrary. We thought it would be preventing people from making
mistakes, but in fact it is likely to have the opposite effect, forcing
them to have to interpret the length differently based on a type that might
not even be defined in the same file where they see the dfdl:length
property. Consider:

<element name="x" type="foo:xType" dfdl:length="448"/>

Is that 448 correct? It depends on the definition of foo:xType. If it's a
simple type derived from string, then length units has to be characters or
bytes, but in all the formats where I see these 448's. They are measured in
bits. This is 56 bytes, holding 64 characters. But when I write out this
element I don't have the information right there to know whether to divide
by 8 or 7 or not without knowledge of the type.


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140723/80bddeb6/attachment.html>


More information about the dfdl-wg mailing list