[DFDL-WG] OGF DFDL WG call today 2007-11-21

Mike Beckerle beckerle at us.ibm.com
Wed Nov 21 10:08:46 CST 2007


We may or may not achieve quorum today because of the US holiday tomorrow 
and big travel day today.

I will join the call at 12noon US.ET, and if we have enough people by 
12:05 then  I'd like to discuss one or more of these topics

* hexBinary and base64Binary  - I still find these confusing. 
* array prefix and suffix - just review resolution of this issue - leaving 
out for now just to be conservative. Could put back in fairly easily.
* choiceType and length properties on xs:choice

My latest musings on hexBinary and base64Binary ....

E.g., XSD allows pattern and enumeration facets on these

     <element name="aThing" type="base64Binary" length="3" 
 pattern="AAAA|////" />

I think that pattern is a regexp for a base64 string matching 3 bytes of 
zeros or 3 bytes of all ones, but my regexp syntax is no doubt incorrectly 
escapified. ("A" is 6 bits of zero, "/" is 6 bits of 1 in base64).

To me this is marvelously confusing. It feels downright silly to allow 
pattern and enumeration on these things. 

I'd like to adopt the strictest possible sensible thing. 
   - hexBinary only, binary representation only, no pattern or enumeration 
facets supported or allowed for this type. 

There is still the issue of default/fixed.

E.g., 

     <element name="unknownStuff" type="hexBinary" fixed="F41306C0" 
dfdl:lengthKind="implicit" />

The user specifies what the bytes are as a hex string, and since they said 
it's hexBinary, they use hex to express the literal content. This is a way 
to say "right here in the data there's this blob of stuff I don't 
understand, but it always contains these data bytes". I've certainly seen 
the need for this sort of thing. It's being ignored on input, but on 
output it generates the fixed bytes of data so as to create valid output 
even when you don't understand this part of the data format. In fact, the 
cases I've seen are using this to skip over decimal numbers the wierd 
format of which isn't understood. The above pattern could be one or more 
decimal numbers in strange formats. 

However, I can see the slippery slope from here to allowing pattern and 
enumeration facets, so I'd be happy ruling out use of default and fixed on 
hexBinary type also, just to make it even simpler.

Also, this element achieves the same end using our "%" escapes in strings.

   <element name="unknownStuff" type="string" dfdl:encoding="ascii" 
fixed="%F4%13%06%C0"  dfdl:lengthKind="implicit"/>

This works for any single-byte-wide character-set encoding. 



Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                  priordan at us.ibm.com 
                  508-599-7046

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071121/068b3c8f/attachment.html 


More information about the dfdl-wg mailing list