[DFDL-WG] binaryNumberRep limitations, xs:decimal and binaryDecimalVirtualPoint

Mike Beckerle mbeckerle at apache.org
Wed Aug 10 15:33:04 EDT 2022


We need the same extensibility thing for binaryCalendarRep.

The binarySeconds existing value is signed. We recently found we needed a
32-bit unsigned version, and it is quite hard to work around this without
adding lots of user-defined functions such as date/time/datetime
constructors that take integer arguments.

I've not yet seen a need to extend binaryFloatRep.

The binaryBooleanTrueRep and binaryBooleanFalseRep are not analogous as
they are values, not enums of kinds of representations.

On Thu, Jul 14, 2022 at 12:43 PM Steve Hanson <smh at uk.ibm.com> wrote:

> DFDL WG agree to
> - clarify the meaning of 'base simple type' for the property
> dfdl:binaryDecimalVirtualPoint (added to
> https://github.com/OpenGridForum/DFDL/issues/28)
> - extend the experimental feature syntax to allow additional enums on
> existing properties (action 328)
> - add extra enums to dfdl:binaryNumberRep as a DFDL 1.0 experimental
> feature (action 329)
>
>
> Regards
>
> Steve Hanson
>
> IBM Integration, Hursley, UK
> Architect, IBM DFDL
> Co-Chair, OGF DFDL Working Group
> smh at uk.ibm.com
> tel:+44-7717-378890
> Note: I work Tuesday to Friday
>
> -----Original Message-----
> *From*: Mike Beckerle <mbeckerle at apache.org
> <Mike%20Beckerle%20%3cmbeckerle at apache.org%3e>>
> *Reply-To*: mbeckerle at apache.org
> *To*: DFDL-WG <dfdl-wg at ogf.org <DFDL-WG%20%3cdfdl-wg at ogf.org%3e>>
> *Subject*: [EXTERNAL] [DFDL-WG] binaryNumberRep limitations, xs:decimal
> and binaryDecimalVirtualPoint
> *Date*: Wed, 08 Jun 2022 13:59:16 -0400
>
> Rather long email, apologies in advance. We are encountering many issues
> where DFDL is too limited in its binary number representations. 6 points
> are made in this email.  To not bury the lead, which is Point 6, I am
> proposing that we make dfdl:binaryNumberRep
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> Rather long email, apologies in advance.
>
> We are encountering many issues where DFDL is too limited in its binary
> number representations.
>
> 6 points are made in this email.
>
> To not bury the lead, which is Point 6, I am proposing that we make
> dfdl:binaryNumberRep an extensible enum, allowing QNames as values so that
> implementations can extend the set of supported binary number
> representations. E.g., dfdl:binaryNumberRep='dfdlx:onesComplement', or
> dfdl:binaryNumberRep='daffodil:signedASN1BERVariableLengthInteger'
>
> Point 1: I think this is just a needed clarification. But Points 1 and 2
> drove the motivation for this whole discussion.
>
> So the description of dfdl:binaryDecimalVirtualPoint says it is allowed on
> types whose *base* is xs:decimal.
>
> That term 'base' is confusing. It can either mean allowed on types derived
> from xs:decimal, as all the signed and unsigned integer types are.
> Or it can mean literally <restriction base="xs:decimal"> ....
> </restriction>. That is, the base attribute of simple type restriction must
> be xs:decimal, or the type="xs:decimal" directly on the element.
>
> I believe the latter is the only thing that makes sense. If
> dfdl:binaryDecimalVirtualPoint is positive, as this type below is nonsense,
> as the type says it is an unsigned integer, but the
> binaryDecimalVirtualPoint says to divide by 100, so that 36000 would become
> 360.00, a non-integer. The infoset <angle>360.00</angle> makes no sense for
> an integer type, and would not validate treating the DFDL schema as an XSD.
>
> <element name="angle" type="with2DecimalFractionDigits"/>
>
> <simpleType name="with2DecimalFractionDigits"
> dfdl:binaryDecimalVirtualPoint="2"> <!-- will divide by 100 -->
>    <restriction base="xs:unsignedShort"> <!-- makes no sense. Has to be
> xs:decimal -->
>       <maxInclusive value="360"/>
>    </restriction>
> </simpleType>
>
> So based on that reasoning, I think we need to improve the clarity and say
> that dfdl:binaryDecimalVirtualPoint applies only to type xs:decimal alone.
> (Removing the word 'base' which creates the subtype confusion.)
>
> That brings me to the next related point.
>
> Point 2: Need for binaryNumberRep='unsignedBinary'
>
> We have a use case we cannot express.
>
> An angle from 0 to 360 degrees is represented by an unsigned 16 bit binary
> integer which if dfdl:binaryDecimalVirtualPoint works with that, can
> represent 000.00 to 360.00.
>
> <simpleType name="angle360" dfdl:lengthKind="explicit" dfdl:length="16"
>
>     dfdl:binaryNumberRep="binary" dfdl:binaryDecimalVirtualPoint="2">
>
>     <restriction base="xs:decimal">
>
>       <minInclusive value="0"/>
>
>       <maxInclusive value="360"/>
>
>     </restriction>
>
>   </simpleType>
>
>
> However, we have no way to say that the above is to be unsigned binary
> integer representation.
> dfdl:binaryNumberRep='binary' means unsigned binary if the type is
> unsigned, and means signed twos-complement if the type is signed, which
> xs:decimal is.
>
> So my definition of angle360 above is no good, as the maximum positive
> value is 327.67, which is insufficient.
>
> I think we need to revise dfdl:binaryNumberRep to allow for distinguishing
> binary unsigned from twosComplement signed types, as well as the packed
> types.
>
> Note that the packed types allow 'bcd' which is an unsigned
> representation, so there is sort of a precedent there for allowing the
> binaryNumberRep type to be unsigned even if the type is signed-capable.
>
> There is already a proposal to add "offsetBinary" as a signed binary
> integer representation. https://github.com/OpenGridForum/DFDL/issues/7
> So I'm suggesting adding "unsignedBinary' as well.
>
> So the complete set (so far) would be dfdl:binaryNumberRep:
>
>    - 'twosComplementBinary' (with legacy 1.0 name 'binary')
>    - 'unsignedBinary'
>    - 'offsetBinary'
>    - 'packed'
>    - 'bcd'
>    - 'ibm4690Packed'
>
> Point 3:  There are other binary integer representations that will be
> needed.
>
> This table comes from a format specification we use:
>
> [image: image.png]
> Ignore the 'Logical' column above, that's about enums. Ignore the "*"
> which is just about when a value must be reserved as an in-band null
> indicator which is the suggested such value.
>
> What is called 'Mod Twos Complement' here is what our existing proposed
> DFDL 2.0 feature calls 'offsetBinary'.
>
> So this table suggests the need for 'unsignedBinary' (already mentioned),
> but also two others: 'signPlusMagnitudeBinary', and 'onesComplementBinary'.
>
> Point 4: Zig Zag Integer representation is getting popular
>
> There's one other representation I know of which is more recent/modern
> called zig-zag integers, popularized by google protocol buffers, but it's a
> clever representation and seemingly used in many places now.
>
> *Binary Value*  *Zig Zag*
> 000            0
> 001           -1
> 010            1
> 011           -2
> 100            2
> 101           -3
> 110            3
> 111           -4
>
> Point 5: Variable Length Binary Integers
>
> There are also variable-length integer formats that are not just strings
> of bits. A common one I have seen is used by ASN.1 BER representation where
> each byte if its MSB is 1 indicates that the integer extends an additional
> byte, contributing 7 bits to the value. Unsigned integers are just the
> concatenation of these bits.
>
> Signed integers are handled after the bits are concatenated together. If
> the first bit of the concatenation is 1, the value is twos complement
> negative value. Hence, if a positive value would have a first bit of 1,
> then an additional byte containing 10000000 must be used as the most
> significant byte so that the first bit will not be 1.
>
> There is no way in DFDL to represent such a variable length integer
> representation and get an integer in the infoset. You have to use a
> hexBinary byte array.
>
> There is a need for a variable-length integer like this to support not
> only explicit length (used by ASN.1 BER), but implicit length as well. In
> this case the last byte of the variable length integer does not have the
> MSB set. Hence, a single byte can represent signed -64 to +63, or unsigned
> 0 to 127. Outside that range multiple bytes must be used, each byte
> contributing 7 bits.
>
> This suggests a need for several additional dfdl:binaryNumberRep enums.
>
> Point 6: Extensibility by implementations is needed here
>
> There are many other representations out there as well.
>
> I think we should have a convention where there is a core set that all
> DFDL representations must provide, and a convention by which DFDL
> implementations can provide additional support.
>
> To me, a good way to do this is to allow the enum values for
> dfdl:binaryNumberRep to be not only regular enums (all of which are
> reserved) but QName syntax, where the prefix can be for a namespace
> recognized by an implementation for providing an extended set of binary
> number representations. (Perhaps the dfdlx: prefix and namespace, or maybe
> we just allow implementation specific namespaces?)
>
> This means of extending enums for existing properties is not part of our
> existing 'experimental features' conventions, but I propose that it should
> be added.
>
> To me, this is a good way to generally allow property enums to be extended
> with experimental features in DFDL implementations, and applies to other
> places such as dfdl:binaryCalendarRep, and numerous other properties where
> we are finding a need for additional enums and want to add them as
> experimental features.
>
> That was long. Thanks for your consideration.
>
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com
>
>
> --
>
>   dfdl-wg mailing list
>
>   dfdl-wg at ogf.org
>
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20220810/8a83ca9c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 18806 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20220810/8a83ca9c/attachment-0001.png>


More information about the dfdl-wg mailing list