[DFDL-WG] proposal: functions to allow reasonable syntax for hex constants in DFDL expression language

Wed Jul 24 10:17:59 EDT 2013

(Revised to fix A-Z typo, and remaining 0x prefix)

On Tue, Jul 23, 2013 at 12:08 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com>wrote:

> Revised to eliminate 0x and also remove use of non-standard fn:error
> function in example:
>
> Revised proposal (version 4) based on feedback:
>
> Rationale: in writing DFDL schemas for binary data formats, there is a
> strong need to express binary data constants in hexadecimal.
>
> Proposed new functions:
>
> dfdl:byte
> dfdl:unsignedByte
> dfdl:short
> dfdl:unsignedShort
> dfdl:int
> dfdl:unsignedInt
> dfdl:long
> dfdl:unsignedLong
>
> These functions behave identically to the XPath standard functions of the
> same names in the 'fn:' namespace, with one exception. When the argument is
> a string beginning with the letter 'x', then then argument must contain in
> addition one or more hexadecimal digits.
>
> Besides the initial 'x', the argument string must contain only hexadecimal
> digits, that is, the characters 0-9a-fA-F. It is a schema definition error
> otherwise.
>
> The hex digits represent a big-endian twos complement representation of a
> binary number.
> Each function has a limit on the number of hex digits, with no more digits
> than 2, 4, 8, or 16 for the byte, short, int, and long versions. That is to
> say for dfdl:byte and dfdl:unsignedByte, there can be at most 2 hex digits.
> For dfdl:short and dfdl:unsignedShort, there can be at most 4 hex digits,
> and so on. It is a schema definition error if more digits are encountered
> than are suitable for the type being created.
>
> Examples:
>
>    - dfdl:unsignedInt("xa1b2c3d4") is the value 2712847316. Note that in
>    the first byte 'a1' the most significant bit is set, but since this is an
>    unsigned type, this is not interpreted as a sign bit.
>    - dfdl:int("xFFFFFFFF") is the int value -1. The sign bit indicates
>    that the number is negative, and this twos complement value represents -1.
>    - dfdl:unsignedByte("xFF") is the unsigned byte value 255
>    - dfdl:byte("xff") is the byte value -1.
>    - dfdl:byte("x7F") is the byte value 127
>    - dfdl:byte("x80") is the byte value -128
>    - dfdl:unsignedByte("x80") is the unsigned byte value 128
>    - dfdl:byte("x0A3") is a schema definition error as the leading zero
>    is not allowed because at most 2 digits are allowed for byte types.
>    - dfdl:short("x0A3") is short value 208. Leading zero causes no issue
>    here because up to 4 digits are allowed.
>
>
> Example of usage in expressions:
>
> <xs:element name="magic_number" type="ex:uint32"
>  dfdl:byteOrder="bigEndian">
>  <xs:annotation>
>    <xs:appinfo source="*http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
> ">
>      <dfdl:setVariable ref="ex:bOrd">
>          {
>           if (xs:unsignedInt(.) eq *dfdl:unsignedInt('xA1B2C3D4')*) then
> 'bigEndian'
>           else if (xs:unsignedInt(.) eq *dfdl:unsignedInt('xD4C3B2A1')*)
> then 'littleEndian'
>           else "Magic number was not xA1B2C3D4 (for bigEndian) or
> xD4C3B2A1 (for littleEndian).")
>           }
>      </dfdl:setVariable>
>    </xs:appinfo>
>  </xs:annotation>
> </xs:element>
>
>
> On Tue, Jul 23, 2013 at 11:15 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com>wrote:
>
>> Note: in the below. Examples use "0x" prefix. The description says the
>> prefix is just "x", so "0xFF" would be invalid syntax.
>>
>> Examples should reflect the "x" prefix.
>>
>> On Tue, Jul 23, 2013 at 10:07 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com
>> > wrote:
>>
>>> Reprising this to the mailing list as it has been hard to find.
>>>
>>> ...mikeb
>>>
>>> ---------- Forwarded message ----------
>>> From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
>>> Date: Thu, Jun 20, 2013 at 12:39 PM
>>> Subject: Re: [DFDL-WG] proposal: functions to allow reasonable syntax
>>> for hex constants in DFDL expression language
>>> To: dfdl-wg at ogf.org
>>>
>>>
>>> Revised proposal (version 2) based on feedback:
>>>
>>> Rationale: in writing DFDL schemas for binary data formats, there is a
>>> strong need to express binary data constants in hexadecimal.
>>>
>>> Proposed new functions:
>>>
>>> dfdl:byte
>>> dfdl:unsignedByte
>>> dfdl:short
>>> dfdl:unsignedShort
>>> dfdl:int
>>> dfdl:unsignedInt
>>> dfdl:long
>>> dfdl:unsignedLong
>>>
>>> These functions behave identically to the XPath standard functions of
>>> the same names in the 'fn:' namespace, with one exception. When the
>>> argument is a string beginning with the letter 'x', then then argument must
>>> contain in addition one or more hexadecimal digits.
>>>
>>> Besides the initial 'x', the argument string must contain only
>>> hexadecimal digits, that is, the characters 0-9a-zA-Z. It is a schema
>>> definition error otherwise.
>>>
>>> The hex digits represent a big endian twos complement representation of
>>> a binary number.
>>> Each function has a limit on the number of hex digits, with no more
>>> digits than 2, 4, 8, or 16 for the byte, short, int, and long versions.
>>> That is to say for dfdl:byte and dfdl:unsignedByte, there can be at most 2
>>> hex digits. For dfdl:short and dfdl:unsignedShort, there can be at most 4
>>> hex digits, and so on. It is a schema definition error if more digits are
>>> encountered than are suitable for the type being created.
>>>
>>> Examples:
>>>
>>> dfdl:unsignedInt("0xa1b2c3d4") is the value 2712847316. Note that in the
>>> first byte 'a1' the most significant bit is set, but since this is an
>>> unsigned type, this is not interpreted as a sign bit.
>>> dfdl:int("0xFFFFFFFF") is the int value -1. The sign bit indicates that
>>> the number is negative, and this twos complement value represents -1.
>>> dfdl:unsignedByte("0xFF") is the unsigned byte value 255
>>> dfdl:byte("0xff") is the byte value -1.
>>> dfdl:byte("0x7F") is the byte value 127
>>> dfdl:byte("0x80") is the byte value -128
>>> dfdl:unsignedByte("0x80") is the unsigned byte value 128
>>> dfdl:byte("0x0A3") is a schema definition error as the leading zero is
>>> not allowed because at most 2 digits are allowed for byte types.
>>> dfdl:short("0x0A3") is short value 208. Leading zero causes no issue
>>> here because up to 4 digits are allowed.
>>>
>>> Example of usage in expressions:
>>>
>>> <xs:element name="magic_number" type="ex:uint32"
>>>  dfdl:byteOrder="bigEndian">
>>>  <xs:annotation>
>>>    <xs:appinfo source="*http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
>>> ">
>>>      <dfdl:setVariable ref="ex:bOrd">
>>>          {
>>>           if (xs:unsignedInt(.) eq *dfdl:unsignedInt('0xa1b2c3d4')*)
>>> then 'bigEndian'
>>>           else if (xs:unsignedInt(.) eq *dfdl:unsignedInt('0xd4c3b2a1')*)
>>> then 'littleEndian'
>>>           else fn:error(ex:magic_number, "Magic number was not
>>> 0xA1B2C3D4 (for bigEndian) or 0xD4C3B2A1 (for littleEndian).")
>>>           }
>>>      </dfdl:setVariable>
>>>    </xs:appinfo>
>>>  </xs:annotation>
>>> </xs:element>
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 12:04 PM, Steve Hanson <smh at uk.ibm.com> wrote:
>>>
>>>> Tim I think you are right - it should just be 'x'.
>>>>
>>>> Regards
>>>>
>>>> Steve Hanson
>>>> Architect, IBM Data Format Description Language (DFDL)
>>>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>>>> IBM SWG, Hursley, UK*
>>>> **smh at uk.ibm.com* <smh at uk.ibm.com>
>>>> tel:+44-1962-815848
>>>>
>>>>
>>>>
>>>> From:        Tim Kimber/UK/IBM at IBMGB
>>>> To:        dfdl-wg at ogf.org,
>>>> Date:        20/06/2013 14:19
>>>> Subject:        Re: [DFDL-WG] proposal: functions to allow reasonable
>>>> syntax for hex constants in DFDL expression language
>>>> Sent by:        dfdl-wg-bounces at ogf.org
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>> Interesting...that never occurred to me. I have always considered the #
>>>> to be a qualifier on the XML character reference entity.
>>>> In XML, the carriage-return character can be specified in any of the
>>>> following ways:
>>>> &cr;                -        A named entity
>>>>                 -        A numeric entity ( starts with '#' )
>>>> &#x0d;                -        A numeric entity in base 16 ( the
>>>> numeric part starts with 'x' )
>>>>
>>>> So the '#' character is signalling that the entity is a numeric entity.
>>>> But there's no need for that to be signalled in this case - the parameter
>>>> for these integer constructors will always be a number.
>>>> On that basis, the prefix should be 'x'. I would prefer that to '#x' on
>>>> the grounds of readability.
>>>>
>>>> regards,
>>>>
>>>> Tim Kimber, DFDL Team,
>>>> Hursley, UK
>>>> Internet:  kimbert at uk.ibm.com
>>>> Tel. 01962-816742
>>>> Internal tel. 37246742
>>>>
>>>>
>>>>
>>>>
>>>> From:        Steve Hanson/UK/IBM
>>>> To:        Tim Kimber/UK/IBM at IBMGB,
>>>> Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
>>>> Date:        20/06/2013 14:01
>>>> Subject:        Re: [DFDL-WG] proposal: functions to allow reasonable
>>>> syntax for hex constants in DFDL expression language
>>>>  ------------------------------
>>>>
>>>>
>>>> I was thinking along the same lines as Tim.
>>>>
>>>> I'd prefer the hex string to start with '#x' for consistency with DFDL
>>>> hex entity syntax.
>>>>
>>>> Regards
>>>>
>>>> Steve Hanson
>>>> Architect, IBM Data Format Description Language (DFDL)
>>>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>>>> IBM SWG, Hursley, UK*
>>>> **smh at uk.ibm.com* <smh at uk.ibm.com>
>>>> tel:+44-1962-815848
>>>>
>>>>
>>>>
>>>>
>>>> From:        Tim Kimber/UK/IBM at IBMGB
>>>> To:        dfdl-wg at ogf.org,
>>>> Date:        20/06/2013 09:44
>>>> Subject:        Re: [DFDL-WG] proposal: functions to allow reasonable
>>>> syntax for hex constants in DFDL expression language
>>>> Sent by:        dfdl-wg-bounces at ogf.org
>>>>  ------------------------------
>>>>
>>>>
>>>>
>>>> As an alternative, we could
>>>> - drop the 'h' prefix
>>>> - use the same names as the XPath integer constructors, including the
>>>> use of the full name for unsigned variants ( e.g. 'dfdl:unsignedInt' )
>>>> - behave exactly like xs:byte / xs:short etc when the string does not
>>>> begin with '0x'
>>>>
>>>> The only difference would be the namespace of the function. So the
>>>> modeller would use the dfdl variant if they needed the '0x' notation, and
>>>> they could choose to use the dfdl variant always when constructing integers
>>>> from string literals.
>>>>
>>>> regards,
>>>>
>>>> Tim Kimber, DFDL Team,
>>>> Hursley, UK
>>>> Internet:  kimbert at uk.ibm.com
>>>> Tel. 01962-816742
>>>> Internal tel. 37246742
>>>>
>>>>
>>>>
>>>>
>>>> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
>>>> To:        dfdl-wg at ogf.org,
>>>> Date:        19/06/2013 23:28
>>>> Subject:        [DFDL-WG] proposal: functions to allow reasonable
>>>> syntax for hex constants in DFDL expression language
>>>> Sent by:        dfdl-wg-bounces at ogf.org
>>>>  ------------------------------
>>>>
>>>>
>>>>
>>>> Rationale: in writing DFDL schemas for binary data formats, there is a
>>>> strong need to express binary data constants in hexadecimal.
>>>>
>>>> Proposed new functions:
>>>>
>>>> dfdl:hByte
>>>> dfdl:hUByte
>>>> dfdl:hShort
>>>> dfdl:hUShort
>>>> dfdl:hInt
>>>> dfdl:hUInt
>>>> dfdl:hLong
>>>> dfdl:hULong
>>>>
>>>> The prefix 'h' is a reminder that the function converts from
>>>> hexadecimal. The prefix 'U' on the name denotes that the function creates
>>>> an unsigned type result. Normally one might like to write this out fully,
>>>> but in this case brevity is helpful given the expected usage of these
>>>> functions to construct literal constants in DFDL expressions.
>>>>
>>>> All the functions take a single string argument. The string must begin
>>>> with "0x" and contain at least one hex digit after that.
>>>> The string must contain only hexadecimal digits, that is, the
>>>> characters 0-9a-zA-Z. It is a schema definition error otherwise.
>>>>
>>>> The hex digits represent a big endian twos complement representation of
>>>> a binary number.
>>>> Each function has a limit on the number of hex digits, with no more
>>>> digits than 2, 4, 8, or 16 for the byte, short, int, and long versions.
>>>> That is to say for dfdl:byte and dfdl:uByte, there can be at most 2 hex
>>>> digits. For dfdl:short and dfdl:uShort, there can be at most 4 hex digits,
>>>> and so on. It is a schema definition error if more digits are encountered
>>>> than are suitable for the type being created.
>>>>
>>>> Examples:
>>>>
>>>> dfdl:hUInt("0xa1b2c3d4") is the value 2712847316. Note that in the
>>>> first byte 'a1' the most significant bit is set, but since this is an
>>>> unsigned type, this is not interpreted as a sign bit.
>>>> dfdl:hInt("0xFFFFFFFF") is the int value -1. The sign bit indicates
>>>> that the number is negative, and this twos complement value represents -1.
>>>> dfdl:hUByte("0xFF") is the unsigned byte value 255
>>>> dfdl:hByte("0xff") is the byte value -1.
>>>> dfdl:hByte("0x7F") is the byte value 127
>>>> dfdl:hByte("0x80") is the byte value -128
>>>> dfdl:hUByte("0x80") is the unsigned byte value 128
>>>> dfdl:hByte("0x0A3") is a schema definition error as the leading zero is
>>>> not allowed because at most 2 digits are allowed for byte types.
>>>> dfdl:hShort("0x0A3") is short value 208. Leading zero causes no issue
>>>> here because up to 4 digits are allowed.
>>>>
>>>> Example of usage in expressions:
>>>>
>>>> <xs:element name="magic_number" type="ex:uint32"
>>>>  dfdl:byteOrder="bigEndian">
>>>>  <xs:annotation>
>>>>    <xs:appinfo source="*http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
>>>> ">
>>>>      <dfdl:setVariable ref="ex:bOrd">
>>>>          {
>>>>           if (xs:unsignedInt(.) eq *dfdl:hUInt('0xa1b2c3d4')*) then
>>>> 'bigEndian'
>>>>           else if (xs:unsignedInt(.) eq *dfdl:hUInt('0xd4c3b2a1')*)
>>>> then 'littleEndian'
>>>>           else fn:error(ex:magic_number, "Magic number was not
>>>> 0xA1B2C3D4 (for bigEndian) or 0xD4C3B2A1 (for littleEndian).")
>>>>           }
>>>>      </dfdl:setVariable>
>>>>    </xs:appinfo>
>>>>  </xs:annotation>
>>>> </xs:element>
>>>>
>>>> --
>>>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *
>>>> www.tresys.com* <http://www.tresys.com/>--
>>>> dfdl-wg mailing list
>>>> dfdl-wg at ogf.org*
>>>> **https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>>>>
>>>> Unless stated otherwise above:
>>>> IBM United Kingdom Limited - Registered in England and Wales with
>>>> number 741598.
>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>>>> 3AU--
>>>> dfdl-wg mailing list
>>>> dfdl-wg at ogf.org
>>>> *https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>>>>
>>>> Unless stated otherwise above:
>>>> IBM United Kingdom Limited - Registered in England and Wales with
>>>> number 741598.
>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>>>> 3AU--
>>>>  dfdl-wg mailing list
>>>>  dfdl-wg at ogf.org
>>>>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>>>>
>>>> Unless stated otherwise above:
>>>> IBM United Kingdom Limited - Registered in England and Wales with
>>>> number 741598.
>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>>>> 3AU
>>>>
>>>> --
>>>>   dfdl-wg mailing list
>>>>   dfdl-wg at ogf.org
>>>>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>>>>
>>>
>>>
>>>
>>> --
>>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
>>> www.tresys.com
>>>
>>>
>>>
>>>
>>> --
>>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
>>> www.tresys.com
>>>
>>>
>>
>>
>> --
>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
>> www.tresys.com
>>
>>
>
>
> --
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
>
>

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130724/539afa05/attachment-0001.html>