Fw: [dfdl-wg] Opaque/BLOB/Uninterpreted/Raw - also hexBinary - (was Re: split into multiple topics - Re: [dfdl-wg] Issues: additional data types)

Steve Hanson smh at uk.ibm.com
Wed Sep 7 05:27:33 CDT 2005


I prefer the hexBinary approach.

The disadvantage Mike observes with the array of bytes approach is not
limited to conversion to XML. It would also apply to any in-memory tree
(SDO etc) built by the DFDL parser. And the element unnecessarily becomes
eligible to take the dfdl:occurs annotation.

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 07/09/2005 11:23 -----
                                                                           
             Mike Beckerle                                                 
             <beckerle at us.ibm.                                             
             com>                                                       To 
             Sent by:                  Mike Beckerle <beckerle at us.ibm.com> 
             owner-dfdl-wg at ggf                                          cc 
             .org                      dfdl-wg at gridforum.org, "Robert E.   
                                       McGrath" <mcgrath at ncsa.uiuc.edu>,   
                                       owner-dfdl-wg at ggf.org               
             06/09/2005 21:21                                      Subject 
                                       [dfdl-wg]                           
                                       Opaque/BLOB/Uninterpreted/Raw -     
                                       also hexBinary - (was Re: split     
                                       into multiple topics - Re:          
                                       [dfdl-wg] Issues: additional data   
                                       types)                              
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           





re: Opaque or uninterpreted or raw fields. These are sometimes called
Blobs, though database people reserve that term for the acronym "BLOB"
which stands for Binary Large Object, which has to do with size being too
large for the smaller binary SQL type objects. I.e., there's no such thing
as a small BLOB in databases. I think in our mailing list we've used blob
to mean "opaque bytes" of any size at all.

I believe use of the 'hexBinary' type is also probably this same topic.
I.e., how to deal with data where you don't know its proper interpretation,
though you can express how big it is so that we can at least copy it from
place to place.

I think there are two choices here. One is just use "occuring" bytes. E.g.,
here's uniterpreted data of length 1234 bytes:

<element name="ignoreMe" type="byte" minOccurs="1234" maxOccurs="1234"
dfdl:repType="binary"/>

This is a basic binary byte array. I think this works fine as a blob/opaque
type.  I believe we do not need any other kind of raw/opaque type. If we
had one, we'd have to have a way to express its length, and be specific
about the units of that length, and the above accomplishes that with pretty
much minimum baggage.  You name it what you want, i.e, "unused" or "dummy"
or "ignore" or whatever you want.

We might want an annotation to indicate that this data should not be
accessed, to distinguish this case from an actually array of bytes that you
DO want to access, but I'm not sure that's worth it. Note that the OMG CAM
model does have an access control attribute. Perhaps we can use that.
However, I doubt it allows distinguishing copy from access.

The alternative is to use the "hexBinary" type for this. In that case we
need to express the size in the DFDL annotation:

<element name="ignoreMe" type="hexBinary" dfdl:repLength="1234"
dfdl:repType="binary"/>

I can think of one advantage of hexBinary over the occuring bytes approach,
which is suppose you do want to use DFDL in the obvious way to convert data
into XML format. Never mind that DFDL is supposed to enable avoiding this,
suppose it's what you want to do. Then my above byte array for the
"ignoreMe" element ends up as:

<ignoreMe>0</ignoreMe><ignoreMe>0</ignoreMe><ignoreMe>0</ignoreMe><ignoreMe>0</ignoreMe><ignoreMe>0</ignoreMe><ignoreMe>0</ignoreMe>....<ignoreMe>0</ignoreMe>


Which is big compared to: <ignoreMe>000000000000...00</ignoreMe> which is
what we'd get if we allow hexBinary as a type.

Note that if we add the hexBinary type, you'll still be able to do it the
other way, so the hexBinary notion is not strictly speaking necessary or
minimalist.

...mikeb

Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA

                                                                           
 Mike                                                                      
 Beckerle/Worcester/IBM at IBMUS                                              
                                                                           
 Sent by:                                                               To 
 owner-dfdl-wg at ggf.org                    "Robert E. McGrath"              
                                          <mcgrath at ncsa.uiuc.edu>          
                                                                        cc 
 09/02/2005 04:34 PM                      dfdl-wg at gridforum.org,           
                                          owner-dfdl-wg at ggf.org            
                                                                   Subject 
                                          split into multiple topics - Re: 
                                          [dfdl-wg] Issues: additional     
                                          data types                       
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           






I'd like to split this topic into several distinct ones:

Arrays - I have a placeholder for this in the doc.

Opaque and "code" types are separate. This is related also to the concept
of "open content".

Enums

Bitfields

Pointers


Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA
                                                                           
 "Robert E. McGrath"                                                       
 <mcgrath at ncsa.uiuc.edu>                                                   
 Sent by: owner-dfdl-wg at ggf.org                                            
                                                                        To 
                                                     dfdl-wg at gridforum.org 
 09/02/2005 03:13 PM                                                    cc 
                                                                           
                                                                   Subject 
                                                     [dfdl-wg] Issues:     
                                                     additional data types 
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           






Greetings,

Here is an "issue" for the DFDL: additional data types that should
be considered.

Please see attached.

---
Robert E. McGrath
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
Champaign, Illinois 61820
(217)-333-6549

mcgrath at ncsa.uiuc.edu (See attached file: DT.htm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20050907/422b96d8/attachment.htm 


More information about the dfdl-wg mailing list