[DFDL-WG] Puzzle: unparsing format involving lengthKind='pattern'

Fri Nov 30 15:13:10 EST 2012

I have a file format where strings have this unusual discipline for
termination.

The string is either 64 characters long or, it is from 0 to 63 characters
long, with a NUL terminator.

I can parse this like so:

    <xs:complexType name="myStringType">
      <xs:sequence>
        <xs:element name="s" type="xs:string"
          dfdl:lengthKind="pattern">
          <xs:annotation>
            <xs:appinfo source="http://www.ogf.org/dfdl/dfdl-1.0/">
              <!-- 0 to 63 occurrences of not a Nul, followed by a Nul
                   (final nul non-captured in the pattern match result),
OR, just 64 non Nuls -->
              <dfdl:element>
                <dfdl:property
name="lengthPattern"><![CDATA[([^\x00]{0,63})(?=\x00)|[^\x00]{64}]]></dfdl:property>
              </dfdl:element>
            </xs:appinfo>
          </xs:annotation>
        </xs:element>
     <xs:element name="term" type="xs:string"
        dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="&NUL;"
dfdl:outputValueCalc="{ '' }"
        minOccurs="0" maxOccurs="1" dfdl:occursCountKind="expression"
        dfdl:occursCount="{ if(fn:string-length(../tns:s) lt 64) then 1
else 0 }" />
      </xs:sequence>
    </xs:complexType>

What I did is use a lengthKind pattern to pick off the content, excluding
the NUL, and then model the NUL explicitly as the initiator of an empty
element which is optionally occurring depending on the length of the string.

That will work. I'd like to hide the "term" element in a hidden group, but
other than that it will work for parsing.

Question: How can I unparse this?

I want the schema and DFDL processor to put the "term" element into the
infoset by itself depending on the length of the 's' element that I will
place into the infoset. So I put an outputValueCalc on there to assign it a
value of empty string, but that will only happen if the occursCount
expression causes the optional element to exist at all.

Will this work for unparsing?

Spec currently says that occursCount expression is used only when parsing,
otherwise the number in the infoset are used. But I don't want to have to
put this syntax-modeling element into the infoset from the application. I
just want the application to create a string, and then based on whether it
is 0-63 long, I want the Nul added or not.

Other ways I've tried to model these strings include as a choice of two
different elements. That has a similar issue that I then have to assemble
my infoset for output using a length dependent element. I can't just put a
string into the infoset and have it decide when unparsing which of the two
choice branches is the right one. (asserts and discriminators are
parse-only also.)

Anyone have better ideas?

...mike

-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121130/8aff5146/attachment.html>