[DFDL-WG] Recursive use of DFDL for variable markup - use case

Wed Apr 15 07:47:08 CDT 2009

>From last week's call:

7. Recursive use of DFDL for variable markup
Use of a DFDL annotated element/type to describe an initiator, length 
prefix, terminator, separator, etc. Steve suggested the most important use 
of "variable markup-like mechanism" in IBM's WTX product is to reference a 
location earlier in the bit stream where a delimiter value is found. We 
handle this already by use of  a path expression. The additional variable 
markup mechanism was to avoid proliferation of keywords for various corner 
cases on initiator, terminator and separator. Eg., what if you want the 
initiator to be "Name" or "name" only, not "NAME", "nAmE", etc. So case 
insensitive is not expressive enough. This can always be modeled, just not 
as an initiator tag. Feeling was to leave out variable markup (other than 
for prefix lengths) for v1.0, and to propose the minimum set of extra 
properties that can be used to address the common use cases, but that IBM 
needed to see whether this satisfied all WTX use cases. 

(Post-call update. It doesn't, there is a use case from WTX, Steve will 
mail this out before next call).

The use case is from EDI.  EDI transactions consist of an initial header 
segment which defines, among other things, the separator that is used by 
the data segments that follow. The problem is that EDI transactions may be 
processed in their entirety, or individual data segments may be processed 
without the header segment.  For the former case, DFDL supports this fine, 
using an XPath expression to locate the separator, which is defined as an 
element, the simple type of which enumerates the allowable values, 
enabling validation. But for the latter case, the XPath expression won't 
resolve, as there is no header. An explicit dfdl:separator property could 
be used instead, being a space separated list of all the allowable values 
- but that then duplicates the separator element enums, leaving a 
maintenance problem. 

<xs:element name="header">
  <xs:complexType>
    <xs:sequence dfdl:lengthKind="implicit">
      <xs:element name="separator" dfdl:lengthKind="explicit" 
dfdl:length="1" dfdl:representation="text">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enum>xxx</xs:enum>
            <xs:enum>yyy</xs:enum>
            <xs:enum>aaa</xs:enum>
            <xs:enum>bbb</xs:enum>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="850">
  <xs:complexType dfdl:lengthKind="delimited">
    <xs:sequence dfdl:lengthKind="implicit" 
dfdl:separator="../../header/separator">
      <xs:element name="one" type="xs:string" />
      <xs:element name="two" type="xs:string" />
      <xs:element name="three" type="xs:string" />
      <xs:element name="four" type="xs:string" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="transaction">
  <xs:complexType>
    <xs:sequence dfdl:lengthKind="implicit">
      <xs:element ref="header"/>
      <xs:element name="segment" maxOccurs="unbounded" />
        <xs:complexType>
          <xs:choice>
            <xs:element ref="800" />
            <xs:element ref="810" />
            <xs:element ref="850" />
          </xs:choice>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Note that WTX solves this by pointing at a global separator element by 
name, instead of to a separator element in the data by path. At runtime, 
the infoset value of the global element is used, and if it is not set, the 
enums are used to provide a list of possible values.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090415/da2bdca4/attachment.html