[DFDL-WG] proposal: functions to allow reasonable syntax for hex constants in DFDL expression language

Tim Kimber KIMBERT at uk.ibm.com
Thu Jun 20 04:44:09 EDT 2013


As an alternative, we could
- drop the 'h' prefix
- use the same names as the XPath integer constructors, including the use 
of the full name for unsigned variants ( e.g. 'dfdl:unsignedInt' )
- behave exactly like xs:byte / xs:short etc when the string does not 
begin with '0x'

The only difference would be the namespace of the function. So the 
modeller would use the dfdl variant if they needed the '0x' notation, and 
they could choose to use the dfdl variant always when constructing 
integers from string literals.

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   19/06/2013 23:28
Subject:        [DFDL-WG] proposal: functions to allow reasonable syntax 
for hex constants in DFDL expression language
Sent by:        dfdl-wg-bounces at ogf.org



Rationale: in writing DFDL schemas for binary data formats, there is a 
strong need to express binary data constants in hexadecimal.

Proposed new functions:

dfdl:hByte
dfdl:hUByte
dfdl:hShort
dfdl:hUShort
dfdl:hInt
dfdl:hUInt
dfdl:hLong
dfdl:hULong

The prefix 'h' is a reminder that the function converts from hexadecimal. 
The prefix 'U' on the name denotes that the function creates an unsigned 
type result. Normally one might like to write this out fully, but in this 
case brevity is helpful given the expected usage of these functions to 
construct literal constants in DFDL expressions. 

All the functions take a single string argument. The string must begin 
with "0x" and contain at least one hex digit after that.
The string must contain only hexadecimal digits, that is, the characters 
0-9a-zA-Z. It is a schema definition error otherwise.

The hex digits represent a big endian twos complement representation of a 
binary number.
Each function has a limit on the number of hex digits, with no more digits 
than 2, 4, 8, or 16 for the byte, short, int, and long versions. That is 
to say for dfdl:byte and dfdl:uByte, there can be at most 2 hex digits. 
For dfdl:short and dfdl:uShort, there can be at most 4 hex digits, and so 
on. It is a schema definition error if more digits are encountered than 
are suitable for the type being created.

Examples:

dfdl:hUInt("0xa1b2c3d4") is the value 2712847316. Note that in the first 
byte 'a1' the most significant bit is set, but since this is an unsigned 
type, this is not interpreted as a sign bit.
dfdl:hInt("0xFFFFFFFF") is the int value -1. The sign bit indicates that 
the number is negative, and this twos complement value represents -1.
dfdl:hUByte("0xFF") is the unsigned byte value 255
dfdl:hByte("0xff") is the byte value -1.
dfdl:hByte("0x7F") is the byte value 127
dfdl:hByte("0x80") is the byte value -128
dfdl:hUByte("0x80") is the unsigned byte value 128
dfdl:hByte("0x0A3") is a schema definition error as the leading zero is 
not allowed because at most 2 digits are allowed for byte types.
dfdl:hShort("0x0A3") is short value 208. Leading zero causes no issue here 
because up to 4 digits are allowed.

Example of usage in expressions:

 <xs:element name="magic_number" type="ex:uint32"
   dfdl:byteOrder="bigEndian">
   <xs:annotation>
     <xs:appinfo source="http://www.ogf.org/dfdl/dfdl-1.0/">
       <dfdl:setVariable ref="ex:bOrd">
           {
            if (xs:unsignedInt(.) eq dfdl:hUInt('0xa1b2c3d4')) then 
'bigEndian'
            else if (xs:unsignedInt(.) eq dfdl:hUInt('0xd4c3b2a1')) then 
'littleEndian'
            else fn:error(ex:magic_number, "Magic number was not 
0xA1B2C3D4 (for bigEndian) or 0xD4C3B2A1 (for littleEndian).")
            }
       </dfdl:setVariable>
     </xs:appinfo>
   </xs:annotation>
</xs:element>

--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130620/2285b862/attachment.html>


More information about the dfdl-wg mailing list