[DFDL-WG] proposal: functions to allow reasonable syntax for hex constants in DFDL expression language

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Jul 23 11:15:05 EDT 2013


Note: in the below. Examples use "0x" prefix. The description says the
prefix is just "x", so "0xFF" would be invalid syntax.

Examples should reflect the "x" prefix.

On Tue, Jul 23, 2013 at 10:07 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com>wrote:

> Reprising this to the mailing list as it has been hard to find.
>
> ...mikeb
>
> ---------- Forwarded message ----------
> From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
> Date: Thu, Jun 20, 2013 at 12:39 PM
> Subject: Re: [DFDL-WG] proposal: functions to allow reasonable syntax for
> hex constants in DFDL expression language
> To: dfdl-wg at ogf.org
>
>
> Revised proposal (version 2) based on feedback:
>
> Rationale: in writing DFDL schemas for binary data formats, there is a
> strong need to express binary data constants in hexadecimal.
>
> Proposed new functions:
>
> dfdl:byte
> dfdl:unsignedByte
> dfdl:short
> dfdl:unsignedShort
> dfdl:int
> dfdl:unsignedInt
> dfdl:long
> dfdl:unsignedLong
>
> These functions behave identically to the XPath standard functions of the
> same names in the 'fn:' namespace, with one exception. When the argument is
> a string beginning with the letter 'x', then then argument must contain in
> addition one or more hexadecimal digits.
>
> Besides the initial 'x', the argument string must contain only hexadecimal
> digits, that is, the characters 0-9a-zA-Z. It is a schema definition error
> otherwise.
>
> The hex digits represent a big endian twos complement representation of a
> binary number.
> Each function has a limit on the number of hex digits, with no more digits
> than 2, 4, 8, or 16 for the byte, short, int, and long versions. That is to
> say for dfdl:byte and dfdl:unsignedByte, there can be at most 2 hex digits.
> For dfdl:short and dfdl:unsignedShort, there can be at most 4 hex digits,
> and so on. It is a schema definition error if more digits are encountered
> than are suitable for the type being created.
>
> Examples:
>
> dfdl:unsignedInt("0xa1b2c3d4") is the value 2712847316. Note that in the
> first byte 'a1' the most significant bit is set, but since this is an
> unsigned type, this is not interpreted as a sign bit.
> dfdl:int("0xFFFFFFFF") is the int value -1. The sign bit indicates that
> the number is negative, and this twos complement value represents -1.
> dfdl:unsignedByte("0xFF") is the unsigned byte value 255
> dfdl:byte("0xff") is the byte value -1.
> dfdl:byte("0x7F") is the byte value 127
> dfdl:byte("0x80") is the byte value -128
> dfdl:unsignedByte("0x80") is the unsigned byte value 128
> dfdl:byte("0x0A3") is a schema definition error as the leading zero is not
> allowed because at most 2 digits are allowed for byte types.
> dfdl:short("0x0A3") is short value 208. Leading zero causes no issue here
> because up to 4 digits are allowed.
>
> Example of usage in expressions:
>
> <xs:element name="magic_number" type="ex:uint32"
>  dfdl:byteOrder="bigEndian">
>  <xs:annotation>
>    <xs:appinfo source="*http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
> ">
>      <dfdl:setVariable ref="ex:bOrd">
>          {
>           if (xs:unsignedInt(.) eq *dfdl:unsignedInt('0xa1b2c3d4')*) then
> 'bigEndian'
>           else if (xs:unsignedInt(.) eq *dfdl:unsignedInt('0xd4c3b2a1')*)
> then 'littleEndian'
>           else fn:error(ex:magic_number, "Magic number was not 0xA1B2C3D4
> (for bigEndian) or 0xD4C3B2A1 (for littleEndian).")
>           }
>      </dfdl:setVariable>
>    </xs:appinfo>
>  </xs:annotation>
> </xs:element>
>
>
>
> On Thu, Jun 20, 2013 at 12:04 PM, Steve Hanson <smh at uk.ibm.com> wrote:
>
>> Tim I think you are right - it should just be 'x'.
>>
>> Regards
>>
>> Steve Hanson
>> Architect, IBM Data Format Description Language (DFDL)
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> IBM SWG, Hursley, UK*
>> **smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>>
>>
>>
>> From:        Tim Kimber/UK/IBM at IBMGB
>> To:        dfdl-wg at ogf.org,
>> Date:        20/06/2013 14:19
>> Subject:        Re: [DFDL-WG] proposal: functions to allow reasonable
>> syntax for hex constants in DFDL expression language
>> Sent by:        dfdl-wg-bounces at ogf.org
>> ------------------------------
>>
>>
>>
>> Interesting...that never occurred to me. I have always considered the #
>> to be a qualifier on the XML character reference entity.
>> In XML, the carriage-return character can be specified in any of the
>> following ways:
>> &cr;                -        A named entity
>> 
                -        A numeric entity ( starts with '#' )
>> &#x0d;                -        A numeric entity in base 16 ( the numeric
>> part starts with 'x' )
>>
>> So the '#' character is signalling that the entity is a numeric entity.
>> But there's no need for that to be signalled in this case - the parameter
>> for these integer constructors will always be a number.
>> On that basis, the prefix should be 'x'. I would prefer that to '#x' on
>> the grounds of readability.
>>
>> regards,
>>
>> Tim Kimber, DFDL Team,
>> Hursley, UK
>> Internet:  kimbert at uk.ibm.com
>> Tel. 01962-816742
>> Internal tel. 37246742
>>
>>
>>
>>
>> From:        Steve Hanson/UK/IBM
>> To:        Tim Kimber/UK/IBM at IBMGB,
>> Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
>> Date:        20/06/2013 14:01
>> Subject:        Re: [DFDL-WG] proposal: functions to allow reasonable
>> syntax for hex constants in DFDL expression language
>>  ------------------------------
>>
>>
>> I was thinking along the same lines as Tim.
>>
>> I'd prefer the hex string to start with '#x' for consistency with DFDL
>> hex entity syntax.
>>
>> Regards
>>
>> Steve Hanson
>> Architect, IBM Data Format Description Language (DFDL)
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> IBM SWG, Hursley, UK*
>> **smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>>
>>
>>
>>
>> From:        Tim Kimber/UK/IBM at IBMGB
>> To:        dfdl-wg at ogf.org,
>> Date:        20/06/2013 09:44
>> Subject:        Re: [DFDL-WG] proposal: functions to allow reasonable
>> syntax for hex constants in DFDL expression language
>> Sent by:        dfdl-wg-bounces at ogf.org
>>  ------------------------------
>>
>>
>>
>> As an alternative, we could
>> - drop the 'h' prefix
>> - use the same names as the XPath integer constructors, including the use
>> of the full name for unsigned variants ( e.g. 'dfdl:unsignedInt' )
>> - behave exactly like xs:byte / xs:short etc when the string does not
>> begin with '0x'
>>
>> The only difference would be the namespace of the function. So the
>> modeller would use the dfdl variant if they needed the '0x' notation, and
>> they could choose to use the dfdl variant always when constructing integers
>> from string literals.
>>
>> regards,
>>
>> Tim Kimber, DFDL Team,
>> Hursley, UK
>> Internet:  kimbert at uk.ibm.com
>> Tel. 01962-816742
>> Internal tel. 37246742
>>
>>
>>
>>
>> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
>> To:        dfdl-wg at ogf.org,
>> Date:        19/06/2013 23:28
>> Subject:        [DFDL-WG] proposal: functions to allow reasonable syntax
>> for hex constants in DFDL expression language
>> Sent by:        dfdl-wg-bounces at ogf.org
>>  ------------------------------
>>
>>
>>
>> Rationale: in writing DFDL schemas for binary data formats, there is a
>> strong need to express binary data constants in hexadecimal.
>>
>> Proposed new functions:
>>
>> dfdl:hByte
>> dfdl:hUByte
>> dfdl:hShort
>> dfdl:hUShort
>> dfdl:hInt
>> dfdl:hUInt
>> dfdl:hLong
>> dfdl:hULong
>>
>> The prefix 'h' is a reminder that the function converts from hexadecimal.
>> The prefix 'U' on the name denotes that the function creates an unsigned
>> type result. Normally one might like to write this out fully, but in this
>> case brevity is helpful given the expected usage of these functions to
>> construct literal constants in DFDL expressions.
>>
>> All the functions take a single string argument. The string must begin
>> with "0x" and contain at least one hex digit after that.
>> The string must contain only hexadecimal digits, that is, the characters
>> 0-9a-zA-Z. It is a schema definition error otherwise.
>>
>> The hex digits represent a big endian twos complement representation of a
>> binary number.
>> Each function has a limit on the number of hex digits, with no more
>> digits than 2, 4, 8, or 16 for the byte, short, int, and long versions.
>> That is to say for dfdl:byte and dfdl:uByte, there can be at most 2 hex
>> digits. For dfdl:short and dfdl:uShort, there can be at most 4 hex digits,
>> and so on. It is a schema definition error if more digits are encountered
>> than are suitable for the type being created.
>>
>> Examples:
>>
>> dfdl:hUInt("0xa1b2c3d4") is the value 2712847316. Note that in the first
>> byte 'a1' the most significant bit is set, but since this is an unsigned
>> type, this is not interpreted as a sign bit.
>> dfdl:hInt("0xFFFFFFFF") is the int value -1. The sign bit indicates that
>> the number is negative, and this twos complement value represents -1.
>> dfdl:hUByte("0xFF") is the unsigned byte value 255
>> dfdl:hByte("0xff") is the byte value -1.
>> dfdl:hByte("0x7F") is the byte value 127
>> dfdl:hByte("0x80") is the byte value -128
>> dfdl:hUByte("0x80") is the unsigned byte value 128
>> dfdl:hByte("0x0A3") is a schema definition error as the leading zero is
>> not allowed because at most 2 digits are allowed for byte types.
>> dfdl:hShort("0x0A3") is short value 208. Leading zero causes no issue
>> here because up to 4 digits are allowed.
>>
>> Example of usage in expressions:
>>
>> <xs:element name="magic_number" type="ex:uint32"
>>  dfdl:byteOrder="bigEndian">
>>  <xs:annotation>
>>    <xs:appinfo source="*http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
>> ">
>>      <dfdl:setVariable ref="ex:bOrd">
>>          {
>>           if (xs:unsignedInt(.) eq *dfdl:hUInt('0xa1b2c3d4')*) then
>> 'bigEndian'
>>           else if (xs:unsignedInt(.) eq *dfdl:hUInt('0xd4c3b2a1')*) then
>> 'littleEndian'
>>           else fn:error(ex:magic_number, "Magic number was not 0xA1B2C3D4
>> (for bigEndian) or 0xD4C3B2A1 (for littleEndian).")
>>           }
>>      </dfdl:setVariable>
>>    </xs:appinfo>
>>  </xs:annotation>
>> </xs:element>
>>
>> --
>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *
>> www.tresys.com* <http://www.tresys.com/>--
>> dfdl-wg mailing list
>> dfdl-wg at ogf.org*
>> **https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>> --
>> dfdl-wg mailing list
>> dfdl-wg at ogf.org
>> *https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>> --
>>  dfdl-wg mailing list
>>  dfdl-wg at ogf.org
>>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>> --
>>   dfdl-wg mailing list
>>   dfdl-wg at ogf.org
>>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>
>
>
> --
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
>
>
>
>
> --
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
>
>


-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130723/14ddbad6/attachment-0001.html>


More information about the dfdl-wg mailing list