[DFDL-WG] DFDL hexBinary and base64Binary

Westhead, Martin (Martin) westhead at avaya.com
Mon Nov 19 13:13:27 CST 2007


I am way out of the loop here, but I felt motivated to throw in a few
cents on this discussion.

 

As far as scope goes, it seems to me a reasonable goal to consider would
be to include all the primitive types of XML Schema as in scope. That
would suggest that hexBinary and base64 should be included.

 

Regarding implementation, what concerns me about the discussion is the
confusion between data model and representation that I seem to be
hearing. (Perhaps I am bringing this to the discussion in which case
please set me straight).

 

The way it looks to me is that when you specify the XML Schema "type" in
the DFDL document you are specifying the data model, or another way to
put it is that you are specifying the form of the XML document that
would be output if your DFDL parser were producing a document. This
should be separated from the discussion of the representation of the
data that you are reading in.

 

So what I expect is that there are three different data models for this
kind of data:

1.	Sequence of bytes
2.	hexBinary
3.	base64

 

And there are three different underlying representations of the data
that could be read from:

1.	bytes
2.	bin hex
3.	base 64

 

And ideally you should be able to choose the model and the data
separately (IMO).

 

Am I making sense?

 

Martin

 

________________________________

From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf
Of Mike Beckerle
Sent: Monday, November 19, 2007 8:34 AM
To: Steve Hanson
Cc: dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] DFDL hexBinary and base64Binary

 


Steve, (& team) 

What you are suggesting is the simplest of the simple. No 'text'
representation at all,  Users who have actual hexidecimal strings in
their data can always model them as either strings or if they're small
enough, integers in base 16 text. 

In this case the only difference between hexBinary and base64Binary is
what happens if you coerce the infoset value to a string and this is
into the API space which is outside the scope of DFDL. 

To me this suggests that we leave out base64Binary entirely for V1.0 to
avoid confusion (it will be confusing to people to explain that
hexBinary and base64Binary are synonymous in DFDL) 

So the net functionality for DFDL v1.0 would be this only: 

type

representation

lengthKind

resulting length (in bytes)

other

xs:hexBinary 

binary 
(note: required - If 'text' specified it causes a schema definition
error. This reserves the 'text' behavior for possible future use.) 

implicit 

xs:length facet 

 

 

 

explicit 

dfdl:length 

Validation: xs:length facet must be equal to resulting length in bytes


(TBD: similar range checks on xs:minLength, xs:maxLength) 

 

 

endOfData or delimited or nullTerminated 

variable 

Validation: xs:length facet must be equal to resulting length in bytes


(TBD: similar range checks on xs:minLength, xs:maxLength) 




I'm very happy with this for V1.0. 

Any further comments or should we go with this for V1.0? 

...mikeb 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan   
                 priordan at us.ibm.com 
                 508-599-7046





Steve Hanson <smh at uk.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 

11/19/2007 10:23 AM 

To

dfdl-wg at ogf.org 

cc

 

Subject

Re: [DFDL-WG] DFDL hexBinary and base64Binary

 

 

 





My view: The logical type is binary, so the data in the information item
is binary, the length facets should always deal in bytes, and validation
checks the length of the binary data in bytes. 

>From the above, of the two simplifications below, I would rather
disallow the text representations of xs:hexBinary and xs:base64Binary.
Fyi MRM today 
- does not support text reps for binary 
- has not had such a request from users 
- uses length/minLength/maxLength facets to validate binary field length
post-parse 
- uses length/maxLength to populate the default for the physical length.


Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 

Mike Beckerle <beckerle at us.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 

16/11/2007 23:09 

 

To

dfdl-wg at ogf.org 

cc

 

Subject

[DFDL-WG] DFDL hexBinary and base64Binary

 

 

 






I'm trying to wrap up the opaque/hexBinary/base64Binary topic. 

I need opinions on this discussion. 

Currently we have a property, dfdl:binaryType : 

Properties Specific to Binary Types (hexBinary, base64Binary) 

Property Name 

Description 

binaryType 

Enum 

This specifies the encoding method for the binary.   

Valid values are 'unspecified', 'hexBinary', 'base64Binary', 'uuencoded'


Annotation: dfdl:element (simple type 'binary', 'opaque')



This property speaks to what kinds of representations can we interpret
and construct logical hexbinary values from? (similarly base64Binary) 

I believe the above is not clear, and causes issues with the xs:length
facet of XSD. 

I propose the 4 tables below which describe the 4 cases: 

hexbinary - binary 
hexbinary - text 
base64binary - binary 
base64binary - text 

I have specified these so that the meaning of the xs:length facet is
always interpreted exactly as in XSD. It always refers to the number of
bytes of the unencoded binary data, and never to the number of
characters in the encoded form. 

type

representation

lengthKind

resulting length (in bytes)

other

xs:hexBinary 

binary 

implicit 

xs:length facet 

 

 

 

explicit 

dfdl:length 

Validation: xs:length facet must be equal to resulting length in bytes


(TBD: similar range checks on xs:minLength, xs:maxLength) 

 

 

endOfData or delimited or nullTerminated 

variable 

 

 

type

representation

lengthKind

resulting length (in characters)

other

xs:hexBinary 

text 

implicit 

2 * xs:length facet 

 

 

 

explicit 

dfdl:length 

Validation: xs:length facet  * 2 must be equal to resulting character
length (after removing all non-hex characters) 

 (TBD: similar range checks on xs:minLength, xs:maxLength) 

 

 

endOfData, delimited, nullTerminated 

Variable 

 

 

type

representation

dfdl:lengthKind

resulting length (in bytes)

other

xs:base64Binary 

binary 

implicit 

xs:length facet 

 

 

 

explicit 

dfdl:length 

Validation: xs:length facet must be equal to resulting length in bytes 

(TBD: similar range checks on xs:minLength, xs:maxLength) 

 

 

endOfData or delimited or nullTerminated 

variable 

 

 

type

representation

lengthKind

resulting length (in characters)

other

xs:base64Binary 

text 

implicit 

8/6 * xs:length facet 

 

 

 

explicit 

dfdl:length 

Validation: xs:length facet  *  8/6 must be equal to resulting character
length (after removing all non-base64-encoding characters) 

(TBD: similar range checks on xs:minLength, xs:maxLength) 

 

 

endOfData, delimited, nullTerminated 

Variable 

 

Looking at the above, one way to simplify things quite a bit is to
disallow the xs:length and xs:minLength and xs:maxLength  facet on
hexBinary and base64Binary types in DFDL schemas. 

Then the implicit lengthKind goes away, and the complex validation check
for the xs:length facet goes away.  I recommend this. 

Another simplification alternative is to disallow representation text
altogether, but I am concerned that peopel with data that does contain
hex or base64 data will naturally want to use these types to model it.
I don't recommend this. 

...mikeb 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan   
               priordan at us.ibm.com 
               508-599-7046
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg 




________________________________

 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU 










________________________________

 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU 





--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071119/e5acdbe1/attachment-0001.html 


More information about the dfdl-wg mailing list