[DFDL-WG] DFDL hexBinary and base64Binary

Mike Beckerle beckerle at us.ibm.com
Mon Nov 19 14:16:07 CST 2007


Well, your statement below exactly reflects the confusion I'd like to 
avoid by by suggesting the dropping of  base64Binary.

See you said the user is "forced to model some binary base64 data as 
hexBinary".  The data is binary bytes, nothing that is stored in any 
textual encoding, just regular binary bytes. So the concept of a user 
having "binary base64 data" doesn't make sense. base64 is about  'text' 
representation for binary data. It's fundamentally a text concept. 

By dropping base64Binary, we can just explain that hexBinary means 
"unknown format binary data" in DFDL. 

...mikeb


Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                  priordan at us.ibm.com 
                  508-599-7046





Steve Hanson <smh at uk.ibm.com> 
11/19/2007 01:20 PM

To
Mike Beckerle/Worcester/IBM at IBMUS
cc
dfdl-wg at ogf.org
Subject
Re: [DFDL-WG] DFDL hexBinary and base64Binary







Mike 

I also considered suggesting we drop xs:base64Binary support. Clearly if 
we weren't using XSD type system we would have a single 'binary' logical 
type but we are re-using XSD type system. So would it be more confusing 
for a user, familiar with XSD type system, to be forced to model some 
binary base64 data as xs:hexBinary?   

Fyi MRM supports both xs:hexBinary and xs:base64Binary and treats them the 
same. 

Bottom line: I'm happy to go with majority view. 

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 


Mike Beckerle <beckerle at us.ibm.com> 
19/11/2007 16:34 


To
Steve Hanson/UK/IBM at IBMGB 
cc
dfdl-wg at ogf.org 
Subject
Re: [DFDL-WG] DFDL hexBinary and base64Binary









Steve, (& team) 

What you are suggesting is the simplest of the simple. No 'text' 
representation at all,  Users who have actual hexidecimal strings in their 
data can always model them as either strings or if they're small enough, 
integers in base 16 text. 

In this case the only difference between hexBinary and base64Binary is 
what happens if you coerce the infoset value to a string and this is into 
the API space which is outside the scope of DFDL. 

To me this suggests that we leave out base64Binary entirely for V1.0 to 
avoid confusion (it will be confusing to people to explain that hexBinary 
and base64Binary are synonymous in DFDL) 

So the net functionality for DFDL v1.0 would be this only: 
type
representation
lengthKind
resulting length (in bytes)
other
xs:hexBinary 
binary 
(note: required - If 'text' specified it causes a schema definition error. 
This reserves the 'text' behavior for possible future use.) 
implicit 
xs:length facet 



explicit 
dfdl:length 
Validation: xs:length facet must be equal to resulting length in bytes   
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData or delimited or nullTerminated 
variable 
Validation: xs:length facet must be equal to resulting length in bytes   
(TBD: similar range checks on xs:minLength, xs:maxLength) 



I'm very happy with this for V1.0. 

Any further comments or should we go with this for V1.0? 

...mikeb 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                priordan at us.ibm.com 
                508-599-7046



Steve Hanson <smh at uk.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 
11/19/2007 10:23 AM 


To
dfdl-wg at ogf.org 
cc

Subject
Re: [DFDL-WG] DFDL hexBinary and base64Binary











My view: The logical type is binary, so the data in the information item 
is binary, the length facets should always deal in bytes, and validation 
checks the length of the binary data in bytes. 

From the above, of the two simplifications below, I would rather disallow 
the text representations of xs:hexBinary and xs:base64Binary. Fyi MRM 
today 
- does not support text reps for binary 
- has not had such a request from users 
- uses length/minLength/maxLength facets to validate binary field length 
post-parse 
- uses length/maxLength to populate the default for the physical length. 

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
Mike Beckerle <beckerle at us.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 
16/11/2007 23:09 


To
dfdl-wg at ogf.org 
cc

Subject
[DFDL-WG] DFDL hexBinary and base64Binary













I'm trying to wrap up the opaque/hexBinary/base64Binary topic. 

I need opinions on this discussion. 

Currently we have a property, dfdl:binaryType : 

Properties Specific to Binary Types (hexBinary, base64Binary) 
Property Name 
Description 
binaryType 
Enum 
This specifies the encoding method for the binary.   
Valid values are ‘unspecified’, ‘hexBinary’, ‘base64Binary’, ‘uuencoded’ 
Annotation: dfdl:element (simple type ‘binary’, ‘opaque’)


This property speaks to what kinds of representations can we interpret and 
construct logical hexbinary values from? (similarly base64Binary) 

I believe the above is not clear, and causes issues with the xs:length 
facet of XSD. 

I propose the 4 tables below which describe the 4 cases: 

hexbinary - binary 
hexbinary - text 
base64binary - binary 
base64binary - text 

I have specified these so that the meaning of the xs:length facet is 
always interpreted exactly as in XSD. It always refers to the number of 
bytes of the unencoded binary data, and never to the number of characters 
in the encoded form. 
type
representation
lengthKind
resulting length (in bytes)
other
xs:hexBinary 
binary 
implicit 
xs:length facet 



explicit 
dfdl:length 
Validation: xs:length facet must be equal to resulting length in bytes   
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData or delimited or nullTerminated 
variable 


type
representation
lengthKind
resulting length (in characters)
other
xs:hexBinary 
text 
implicit 
2 * xs:length facet 



explicit 
dfdl:length 
Validation: xs:length facet  * 2 must be equal to resulting character 
length (after removing all non-hex characters) 
 (TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData, delimited, nullTerminated 
Variable 


type
representation
dfdl:lengthKind
resulting length (in bytes)
other
xs:base64Binary 
binary 
implicit 
xs:length facet 



explicit 
dfdl:length 
Validation: xs:length facet must be equal to resulting length in bytes 
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData or delimited or nullTerminated 
variable 


type
representation
lengthKind
resulting length (in characters)
other
xs:base64Binary 
text 
implicit 
8/6 * xs:length facet 



explicit 
dfdl:length 
Validation: xs:length facet  *  8/6 must be equal to resulting character 
length (after removing all non-base64-encoding characters) 
(TBD: similar range checks on xs:minLength, xs:maxLength) 


endOfData, delimited, nullTerminated 
Variable 

Looking at the above, one way to simplify things quite a bit is to 
disallow the xs:length and xs:minLength and xs:maxLength  facet on 
hexBinary and base64Binary types in DFDL schemas. 
Then the implicit lengthKind goes away, and the complex validation check 
for the xs:length facet goes away.  I recommend this. 
Another simplification alternative is to disallow representation text 
altogether, but I am concerned that peopel with data that does contain hex 
or base64 data will naturally want to use these types to model it.  I 
don't recommend this. 
...mikeb 
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
              priordan at us.ibm.com 
              508-599-7046
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg 




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 










Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg 





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 







-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071119/12f06a38/attachment-0001.html 


More information about the dfdl-wg mailing list