[DFDL-WG] DFDL hexBinary and base64Binary

Mike Beckerle beckerle at us.ibm.com
Fri Nov 16 17:09:26 CST 2007


I'm trying to wrap up the opaque/hexBinary/base64Binary topic.

I need opinions on this discussion.

Currently we have a property, dfdl:binaryType :

Properties Specific to Binary Types (hexBinary, base64Binary)
Property Name
Description
binaryType
Enum
This specifies the encoding method for the binary. 
Valid values are ‘unspecified’, ‘hexBinary’, ‘base64Binary’, ‘uuencoded’
Annotation: dfdl:element (simple type ‘binary’, ‘opaque’)


This property speaks to what kinds of representations can we interpret and 
construct logical hexbinary values from? (similarly base64Binary)

I believe the above is not clear, and causes issues with the xs:length 
facet of XSD.

I propose the 4 tables below which describe the 4 cases:

hexbinary - binary
hexbinary - text
base64binary - binary
base64binary - text

I have specified these so that the meaning of the xs:length facet is 
always interpreted exactly as in XSD. It always refers to the number of 
bytes of the unencoded binary data, and never to the number of characters 
in the encoded form.


type
representation
lengthKind
resulting length (in bytes)
other
xs:hexBinary
binary
implicit
xs:length facet 



explicit
dfdl:length 
Validation: xs:length facet must be equal to resulting length in bytes 
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData or delimited or nullTerminated
variable


type
representation
lengthKind
resulting length (in characters)
other
xs:hexBinary
text
implicit
2 * xs:length facet



explicit
dfdl:length
Validation: xs:length facet  * 2 must be equal to resulting character 
length (after removing all non-hex characters)
 (TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData, delimited, nullTerminated
Variable



type
representation
dfdl:lengthKind
resulting length (in bytes)
other
xs:base64Binary
binary
implicit
xs:length facet 



explicit
dfdl:length 
Validation: xs:length facet must be equal to resulting length in bytes 
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData or delimited or nullTerminated
variable


type
representation
lengthKind
resulting length (in characters)
other
xs:base64Binary
text
implicit
8/6 * xs:length facet



explicit
dfdl:length
Validation: xs:length facet  *  8/6 must be equal to resulting character 
length (after removing all non-base64-encoding characters)
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData, delimited, nullTerminated
Variable


Looking at the above, one way to simplify things quite a bit is to 
disallow the xs:length and xs:minLength and xs:maxLength  facet on 
hexBinary and base64Binary types in DFDL schemas.
Then the implicit lengthKind goes away, and the complex validation check 
for the xs:length facet goes away.  I recommend this.

Another simplification alternative is to disallow representation text 
altogether, but I am concerned that peopel with data that does contain hex 
or base64 data will naturally want to use these types to model it.  I 
don't recommend this.

...mikeb

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                  priordan at us.ibm.com 
                  508-599-7046


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071116/c74d72c7/attachment-0001.html 


More information about the dfdl-wg mailing list