[DFDL-WG] DFDL hexBinary and base64Binary

Simon Parker simon.parker at polarlake.com
Tue Nov 20 10:31:43 CST 2007


Good evening.
 
Here's a minor contribution on binary types.
 
1. In my experience the common and preferred type for uninterpreted data
is Base64Binary, not HexBinary. I see no reason to drop either, though.

2. A reference to a binary element using xpath on the infoset should
give the entire binary value in some implementation-dependent form. It
should not return text, so applying functions like substring() would be
way out of line.
 
3. Can XPath expressions and functions manipulate binary values? Since
XPath is based on  XML's type system, I doubt it. I think we should
treat it as opaque.
 
4. Specifying a text encoding for binary data is a hint for XML text
generators. It doesn't change the external representation or the
internal infoset value, which are both binary.

Just my views.
Sorry I didn't have time to participate earlier, and might not have a
chance to elaborate before Wednesday's conference either.
I read everything, though.
 Simon
 


________________________________

	From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org]
On Behalf Of Steve Hanson
	Sent: 19 November 2007 18:20
	To: Mike Beckerle
	Cc: dfdl-wg at ogf.org
	Subject: Re: [DFDL-WG] DFDL hexBinary and base64Binary
	
	

	Mike 
	
	I also considered suggesting we drop xs:base64Binary support.
Clearly if we weren't using XSD type system we would have a single
'binary' logical type but we are re-using XSD type system. So would it
be more confusing for a user, familiar with XSD type system, to be
forced to model some binary base64 data as xs:hexBinary?   
	
	Fyi MRM supports both xs:hexBinary and xs:base64Binary and
treats them the same. 
	
	Bottom line: I'm happy to go with majority view. 
	
	Regards, Steve
	
	Steve Hanson
	WebSphere Message Brokers
	Hursley, UK
	Internet: smh at uk.ibm.com
	Phone (+44)/(0) 1962-815848 
	
	
	
Mike Beckerle <beckerle at us.ibm.com> 

19/11/2007 16:34 

To
Steve Hanson/UK/IBM at IBMGB 
cc
dfdl-wg at ogf.org 
Subject
Re: [DFDL-WG] DFDL hexBinary and base64Binary

	




	
	Steve, (& team) 
	
	What you are suggesting is the simplest of the simple. No 'text'
representation at all,  Users who have actual hexidecimal strings in
their data can always model them as either strings or if they're small
enough, integers in base 16 text. 
	
	In this case the only difference between hexBinary and
base64Binary is what happens if you coerce the infoset value to a string
and this is into the API space which is outside the scope of DFDL. 
	
	To me this suggests that we leave out base64Binary entirely for
V1.0 to avoid confusion (it will be confusing to people to explain that
hexBinary and base64Binary are synonymous in DFDL) 
	
	So the net functionality for DFDL v1.0 would be this only: 
	
type
representation
lengthKind
resulting length (in bytes)
other
xs:hexBinary 	binary 
(note: required - If 'text' specified it causes a schema definition
error. This reserves the 'text' behavior for possible future use.)
implicit 	xs:length facet 	
		explicit 	dfdl:length 	Validation: xs:length
facet must be equal to resulting length in bytes   

(TBD: similar range checks on xs:minLength, xs:maxLength) 

		endOfData or delimited or nullTerminated 	variable
Validation: xs:length facet must be equal to resulting length in bytes


(TBD: similar range checks on xs:minLength, xs:maxLength) 

	
	
	
	I'm very happy with this for V1.0. 
	
	Any further comments or should we go with this for V1.0? 
	
	...mikeb 
	
	Mike Beckerle
	STSM, Architect, Scalable Computing
	IBM Software Group
	Information Platform and Solutions
	Westborough, MA 01581
	direct: voice and FAX 508-599-7148
	assistant: Pam Riordan   
	                priordan at us.ibm.com 
	                508-599-7046
	
	
	
	
Steve Hanson <smh at uk.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 

11/19/2007 10:23 AM 



To
dfdl-wg at ogf.org 
cc
Subject
Re: [DFDL-WG] DFDL hexBinary and base64Binary


	


	
	
	
	
	My view: The logical type is binary, so the data in the
information item is binary, the length facets should always deal in
bytes, and validation checks the length of the binary data in bytes. 
	
	From the above, of the two simplifications below, I would rather
disallow the text representations of xs:hexBinary and xs:base64Binary.
Fyi MRM today 
	- does not support text reps for binary 
	- has not had such a request from users 
	- uses length/minLength/maxLength facets to validate binary
field length post-parse 
	- uses length/maxLength to populate the default for the physical
length. 
	
	Regards, Steve
	
	Steve Hanson
	WebSphere Message Brokers
	Hursley, UK
	Internet: smh at uk.ibm.com
	Phone (+44)/(0) 1962-815848 
	
Mike Beckerle <beckerle at us.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 

16/11/2007 23:09 



To
dfdl-wg at ogf.org 
cc
Subject
[DFDL-WG] DFDL hexBinary and base64Binary




	


	
	
	
	
	
	I'm trying to wrap up the opaque/hexBinary/base64Binary topic. 
	
	I need opinions on this discussion. 
	
	Currently we have a property, dfdl:binaryType : 
	
	Properties Specific to Binary Types (hexBinary, base64Binary) 
Property Name 	Description 
binaryType 	Enum 

This specifies the encoding method for the binary.   

Valid values are 'unspecified', 'hexBinary', 'base64Binary', 'uuencoded'


Annotation: dfdl:element (simple type 'binary', 'opaque')

	
	
	This property speaks to what kinds of representations can we
interpret and construct logical hexbinary values from? (similarly
base64Binary) 
	
	I believe the above is not clear, and causes issues with the
xs:length facet of XSD. 
	
	I propose the 4 tables below which describe the 4 cases: 
	
	hexbinary - binary 
	hexbinary - text 
	base64binary - binary 
	base64binary - text 
	
	I have specified these so that the meaning of the xs:length
facet is always interpreted exactly as in XSD. It always refers to the
number of bytes of the unencoded binary data, and never to the number of
characters in the encoded form. 
type
representation
lengthKind
resulting length (in bytes)
other
xs:hexBinary 	binary 	implicit 	xs:length facet 	
		explicit 	dfdl:length 	Validation: xs:length
facet must be equal to resulting length in bytes   

(TBD: similar range checks on xs:minLength, xs:maxLength) 

		endOfData or delimited or nullTerminated 	variable


type
representation
lengthKind
resulting length (in characters)
other
xs:hexBinary 	text 	implicit 	2 * xs:length facet 	
		explicit 	dfdl:length 	Validation: xs:length
facet  * 2 must be equal to resulting character length (after removing
all non-hex characters) 

 (TBD: similar range checks on xs:minLength, xs:maxLength) 

		endOfData, delimited, nullTerminated 	Variable 	

	
type
representation
dfdl:lengthKind
resulting length (in bytes)
other
xs:base64Binary 	binary 	implicit 	xs:length facet 	
		explicit 	dfdl:length 	Validation: xs:length
facet must be equal to resulting length in bytes 

(TBD: similar range checks on xs:minLength, xs:maxLength) 

		endOfData or delimited or nullTerminated 	variable


	
type
representation
lengthKind
resulting length (in characters)
other
xs:base64Binary 	text 	implicit 	8/6 * xs:length facet 	
		explicit 	dfdl:length 	Validation: xs:length
facet  *  8/6 must be equal to resulting character length (after
removing all non-base64-encoding characters) 

(TBD: similar range checks on xs:minLength, xs:maxLength) 

		endOfData, delimited, nullTerminated 	Variable 	

	Looking at the above, one way to simplify things quite a bit is
to disallow the xs:length and xs:minLength and xs:maxLength  facet on
hexBinary and base64Binary types in DFDL schemas. 

	Then the implicit lengthKind goes away, and the complex
validation check for the xs:length facet goes away.  I recommend this. 

	Another simplification alternative is to disallow representation
text altogether, but I am concerned that peopel with data that does
contain hex or base64 data will naturally want to use these types to
model it.  I don't recommend this. 

	...mikeb 

	Mike Beckerle
	STSM, Architect, Scalable Computing
	IBM Software Group
	Information Platform and Solutions
	Westborough, MA 01581
	direct: voice and FAX 508-599-7148
	assistant: Pam Riordan   
	              priordan at us.ibm.com 
	              508-599-7046
	--
	dfdl-wg mailing list
	dfdl-wg at ogf.org
	http://www.ogf.org/mailman/listinfo/dfdl-wg 
	
	
	
	
________________________________


	Unless stated otherwise above:
	IBM United Kingdom Limited - Registered in England and Wales
with number 741598. 
	Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire PO6 3AU 

	
	
	
	
	
	
	
	
	
________________________________


	Unless stated otherwise above:
	IBM United Kingdom Limited - Registered in England and Wales
with number 741598. 
	Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire PO6 3AU 

	
	
	
	
	--
	dfdl-wg mailing list
	dfdl-wg at ogf.org
	http://www.ogf.org/mailman/listinfo/dfdl-wg 
	
	
	
	
________________________________

	
	
	

	Unless stated otherwise above:
	IBM United Kingdom Limited - Registered in England and Wales
with number 741598. 
	Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire PO6 3AU 

	
	
	
	
	
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071120/7bc1a128/attachment-0001.html 


More information about the dfdl-wg mailing list