[DFDL-WG] Puzzle: unparsing format involving lengthKind='pattern'

Mon Dec 3 08:35:24 EST 2012

Actually, that helps.

I think that, plus an inputValueCalc/outputValueCalc pair will fix it. The
inputValueCalc strips any trailing NUL, the outputValueCalc adds one if the
length is less than 64.

On Mon, Dec 3, 2012 at 7:21 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> Afraid not. The problem is that the NUL is not a terminator in the DFDL
> sense of the word (ie, mandatory) but is an early-end-of-data indicator.
> I can't think of an elegant way to handle this so I would simply model the
> data as a single string with a dfdl:lengthPattern that consumed either 0-63
> chars plus NUL or 64 chars. This puts the NUL in the infoset and puts the
> onus on the user to trim the NUL when reading the infoset, and supply the
> NUL when creating the infoset.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        dfdl-wg at ogf.org,
> Date:        30/11/2012 21:02
> Subject:        [DFDL-WG] Puzzle: unparsing format involving
> lengthKind='pattern'
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
>
> I have a file format where strings have this unusual discipline for
> termination.
>
> The string is either 64 characters long or, it is from 0 to 63 characters
> long, with a NUL terminator.
>
> I can parse this like so:
>
>     <xs:complexType name="myStringType">
>       <xs:sequence>
>         <xs:element name="s" type="xs:string"
>           dfdl:lengthKind="pattern">
>           <xs:annotation>
>             <xs:appinfo source="*http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
> ">
>               <!-- 0 to 63 occurrences of not a Nul, followed by a Nul
>                    (final nul non-captured in the pattern match result),
> OR, just 64 non Nuls -->
>               <dfdl:element>
>                 <dfdl:property
> name="lengthPattern"><![CDATA[([^\x00]{0,63})(?=\x00)|[^\x00]{64}]]></dfdl:property>
>               </dfdl:element>
>             </xs:appinfo>
>           </xs:annotation>
>         </xs:element>
>      <xs:element name="term" type="xs:string"
>         dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="&NUL;"
> dfdl:outputValueCalc="{ '' }"
>         minOccurs="0" maxOccurs="1" dfdl:occursCountKind="expression"
>         dfdl:occursCount="{ if(fn:string-length(../tns:s) lt 64) then 1
> else 0 }" />
>       </xs:sequence>
>     </xs:complexType>
>
> What I did is use a lengthKind pattern to pick off the content, excluding
> the NUL, and then model the NUL explicitly as the initiator of an empty
> element which is optionally occurring depending on the length of the string.
>
> That will work. I'd like to hide the "term" element in a hidden group, but
> other than that it will work for parsing.
>
> Question: How can I unparse this?
>
> I want the schema and DFDL processor to put the "term" element into the
> infoset by itself depending on the length of the 's' element that I will
> place into the infoset. So I put an outputValueCalc on there to assign it a
> value of empty string, but that will only happen if the occursCount
> expression causes the optional element to exist at all.
>
> Will this work for unparsing?
>
> Spec currently says that occursCount expression is used only when parsing,
> otherwise the number in the infoset are used. But I don't want to have to
> put this syntax-modeling element into the infoset from the application. I
> just want the application to create a string, and then based on whether it
> is 0-63 long, I want the Nul added or not.
>
> Other ways I've tried to model these strings include as a choice of two
> different elements. That has a similar issue that I then have to assemble
> my infoset for output using a length dependent element. I can't just put a
> string into the infoset and have it decide when unparsing which of the two
> choice branches is the right one. (asserts and discriminators are
> parse-only also.)
>
> Anyone have better ideas?
>
> ...mike
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  781-330-0412
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121203/45d009a8/attachment.html>