[DFDL-WG] Fw: Issue 140 and empty string - question on escape schemes as empty-string qualifiers

Steve Hanson smh at uk.ibm.com
Tue Feb 14 08:30:44 EST 2012


Not sure that we had discussed this on a WG call, so adding to today's 
agenda. There's a potential spec update needed.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 14/02/2012 13:30 -----

From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:     Tim Kimber/UK/IBM at IBMGB
Date:   31/01/2012 13:58
Subject:        Re: Issue 140 and empty string - question on escape 
schemes as empty-string qualifiers


Mike - some replies below

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB
Date:   30/01/2012 20:13
Subject:        Issue 140 and empty string - question on escape schemes as 
empty-string qualifiers



I think we forgot about escape schemes and how they are used to quote 
around empty strings, and possibly nil indicators, or I'd like 
clarification anyway.

E.g., 

<dfdl:defineEscapeScheme name="quotedStrings">
   <dfdl:escapeScheme escapeBlockStart="'"
                escapeBlockEnd="'" escapeKind="escapeBlock" />
</dfdl:defineEscapeScheme>

<element name="x" type="string" nillable="true" dfdl:nilValue="nil" 
dfdl:escapeSchemeRef="quotedStrings"/>

Now, if data is [nil, nil] I get two nils.  <x xsi:nil="true/><x 
xsi:nil="true/>

What if data is ['nil','nil'] - either I still get two nils, or I get two 
non-nil strings with "nil" as their contents: <x>nil</x><x>nil</x>

Which is it?

SMH: According to the property precedence order in section 22 of the spec, 
the escape scheme is applied before nil value processing when parsing, and 
after nil value processing on unparsing. That is independent of the 
nilKind. So in your example you would get two nils in the infoset.

Similarly, assume please that empty string matches the syntax for empty 
per initiator/terminator and emptyValueDelimiterPolicy, Now if I have 

<element name="myString" type="string" minOccurs="0", maxOccurs="2" 
dfdl:escapeSchemeRef="quotedStrings">

It's all optional, so if the data is ['',''] then I either get nothing in 
the infoset (because empty creates nothing for optionals), or I get two 
empty strings in the infoset.

Which is it?

SMH: I would look at this from the unparsing angle. If there is nothing in 
the infoset then I would expect to see nothing in the data, I would not 
expect to see escaped nothing. That's true if generateEscapeBlock is 
'always' or 'whenNeeded'. If I had an empty string in the infoset then I 
would expect it to be escaped in the data if I said 'always' but not if I 
said 'whenNeeded' (because %ES; is not allowed as a delimiter or as a 
value extraEscapedCharacters, so escaping empty string can never be 
needed.)  From this, the only way I could get '' in the data would be if I 
had escaped an empty string. Therefore on parsing, I would treat '' as an 
escaped empty string and add empty string to infoset.  This sounds right 
to me. In our action 140 document, we have defined 'empty' to mean that 
the returned length (however obtained) is 0. If I encounter escape 
characters than I would claim that slot in the data is not 'empty'.

We should check that this is consistent with how emptyValueDelimiterPolicy 
is applied. For parsing section 22 has this correct, and 
emptyValueDelimiterPolicy is examined before escape scheme applied. But 
for unparsing section 22 has it the wrong way round - the property should 
be applied after any escaping/padding has taken place.







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU











Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120214/37d25aa1/attachment.html>


More information about the dfdl-wg mailing list