[DFDL-WG] Fw: Issue 140 and empty string - question on escape schemes as empty-string qualifiers
Steve Hanson
smh at uk.ibm.com
Tue Feb 14 08:30:44 EST 2012
Not sure that we had discussed this on a WG call, so adding to today's
agenda. There's a potential spec update needed.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 14/02/2012 13:30 -----
From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc: Tim Kimber/UK/IBM at IBMGB
Date: 31/01/2012 13:58
Subject: Re: Issue 140 and empty string - question on escape
schemes as empty-string qualifiers
Mike - some replies below
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB
Date: 30/01/2012 20:13
Subject: Issue 140 and empty string - question on escape schemes as
empty-string qualifiers
I think we forgot about escape schemes and how they are used to quote
around empty strings, and possibly nil indicators, or I'd like
clarification anyway.
E.g.,
<dfdl:defineEscapeScheme name="quotedStrings">
<dfdl:escapeScheme escapeBlockStart="'"
escapeBlockEnd="'" escapeKind="escapeBlock" />
</dfdl:defineEscapeScheme>
<element name="x" type="string" nillable="true" dfdl:nilValue="nil"
dfdl:escapeSchemeRef="quotedStrings"/>
Now, if data is [nil, nil] I get two nils. <x xsi:nil="true/><x
xsi:nil="true/>
What if data is ['nil','nil'] - either I still get two nils, or I get two
non-nil strings with "nil" as their contents: <x>nil</x><x>nil</x>
Which is it?
SMH: According to the property precedence order in section 22 of the spec,
the escape scheme is applied before nil value processing when parsing, and
after nil value processing on unparsing. That is independent of the
nilKind. So in your example you would get two nils in the infoset.
Similarly, assume please that empty string matches the syntax for empty
per initiator/terminator and emptyValueDelimiterPolicy, Now if I have
<element name="myString" type="string" minOccurs="0", maxOccurs="2"
dfdl:escapeSchemeRef="quotedStrings">
It's all optional, so if the data is ['',''] then I either get nothing in
the infoset (because empty creates nothing for optionals), or I get two
empty strings in the infoset.
Which is it?
SMH: I would look at this from the unparsing angle. If there is nothing in
the infoset then I would expect to see nothing in the data, I would not
expect to see escaped nothing. That's true if generateEscapeBlock is
'always' or 'whenNeeded'. If I had an empty string in the infoset then I
would expect it to be escaped in the data if I said 'always' but not if I
said 'whenNeeded' (because %ES; is not allowed as a delimiter or as a
value extraEscapedCharacters, so escaping empty string can never be
needed.) From this, the only way I could get '' in the data would be if I
had escaped an empty string. Therefore on parsing, I would treat '' as an
escaped empty string and add empty string to infoset. This sounds right
to me. In our action 140 document, we have defined 'empty' to mean that
the returned length (however obtained) is 0. If I encounter escape
characters than I would claim that slot in the data is not 'empty'.
We should check that this is consistent with how emptyValueDelimiterPolicy
is applied. For parsing section 22 has this correct, and
emptyValueDelimiterPolicy is examined before escape scheme applied. But
for unparsing section 22 has it the wrong way round - the property should
be applied after any escaping/padding has taken place.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120214/37d25aa1/attachment.html>
More information about the dfdl-wg
mailing list