[dfdl-wg] simple way to study hard DFDL example problem - IBM Format VS rec ords as XML

mike.beckerle at ascentialsoftware.com mike.beckerle at ascentialsoftware.com
Thu Nov 18 20:53:18 CST 2004


I've come up with a way to articulate the difficulties I'm having with DFDL
for complex file formats.
 
This problem may not be that hard for someone with more XML, XPath or XQuery
experience, so I'd apprecate it if you could look it over and if necessary
even run it by your resident XML experts.
 
In case the emailer mangles all the line lengths, I've also attached the
below as a file.
 
<!-- Example motivated by DFDL for IBM Format-VS -->
<!-- see http://tinyurl.com/3s2bq <http://tinyurl.com/3s2bq>  for details on
IBM Format-VS -->
 
<!-- Logically, our data is this: -->
 
<ITEM>The first item</ITEM>
<ITEM>This is the second item</ITEM>
<ITEM>The third</ITEM>
 
<!-- That is, data having this "logical" schema -->
 
<sequence>
  <element name="ITEM" type="string" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
 
<!-- But the below is the input data were starting from. What you see below
simulates
     the structural issues of IBM Format-VS, but converting the problem into
an XML to XML
     transformation problem -->
 
<BLOCK>
  <SEGMENT>
    <WHOLE/> <!-- a WHOLE segment holds a whole item (Duh!). This element is
really a type tag. -->
    <DATA>The first item</DATA>  
  </SEGMENT>
</BLOCK>
 
<BLOCK>
  <SEGMENT>
    <FIRST/> <!-- a FIRST segment holds the first part of an item. -->
    <DATA>Thi</DATA>
  </SEGMENT>
</BLOCK>
 
<BLOCK>
  <SEGMENT>
    <MIDDLE/> <!-- a MIDDLE segment holds data from the center of an item
-->
    <DATA>s is t</DATA>
  </SEGMENT>
</BLOCK>
 
<BLOCK>
  <SEGMENT>
    <MIDDLE/> 
    <DATA>he sec</DATA>
  </SEGMENT>
</BLOCK>
 
<BLOCK>
  <SEGMENT>
    <LAST/> <!-- a LAST segment holds data from the end of the item.  -->
    <DATA>ond item</DATA>
  </SEGMENT>
  <SEGMENT>
    <WHOLE/><!-- This second segment in this block is a WHOLE segment.
However 
                 in general the 2nd segment of a block could be a WHOLE or
the 
                 FIRST segment of another multi-segment multi-block spanning
item -->
    <DATA>Third item</DATA>
  </SEGMENT>
</BLOCK>
 
<!-- Some observations: -->
<!-- Data is organized into BLOCKs -->
<!-- Each block contains 1 or 2 SEGMENTs -->
<!-- Each SEGMENT is either a WHOLE item, or the item spans 2 or more
SEGMENTs -->
<!-- Spanning data is broken on arbitrary boundaries across segments it
spans -->
<!-- Spanning involves a FIRST, MIDDLE*, LAST segment structure. -->
<!-- MIDDLE* means zero or more MIDDLE segments. -->
 
<!-- The question: how can we express the transformation into the desired
logical form?
     Or is this beyond the call of duty for DFDL?
     Goals include to be as declarative as possible, and ideally, do it as a
set of
     XML Schema annotations in the GGF DFDL style.  --> 
 
<!-- here's an XSD (untested) for the input data structure -->
 
<complexType name="Format_VS_t">
 <sequence>
   <element name="BLOCK" type="Block_t" minOccurs="0"
maxOccurs="unbounded"/>
 </sequence>
</complexType>
 
<complexType name="Block_t">
      <sequence>
         <element name="SEGMENT" type="Segment_t" minOccurs="1"
maxOccurs="2"/>
      </sequence>
</complexType>
 
<complexType name="Segment_t">
 <sequence>
  <choice>
    <element name="WHOLE">
    </element>
    <element name="FIRST">
    </element>
    <element name="LAST">
    </element>
    <element name="MIDDLE">
    </element>
  </choice>
  <element name="DATA" type="string"/>
 </sequence>
</complexType>
 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20041118/1c056b6f/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: problem.xml
Type: application/octet-stream
Size: 2943 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20041118/1c056b6f/attachment.obj 


More information about the dfdl-wg mailing list