[DFDL-WG] Action 306 - IBM DFDL behaviour when parsing empty strings

Steve Hanson smh at uk.ibm.com
Wed Apr 3 07:05:51 EDT 2019


306
Confirm IBM DFDL behaviour when parsing empty strings (Steve)
7/8: IBM DFDL has not fully implemented the behaviour changes arising from 
action 140 with respect to empty string elements. Daffodil is about to do 
so. IBM DFDL users have complained about lack of defaults when parsing but 
other than that appear happy. Are the rules in the spec for empty strings 
over complicated?  Steve to document the behaviour for IBM DFDL to inform 
the discussion.
...
1/11: In progress - there are a lot of subtle scenarios
15/11: Not discussed
...
7/2/19: No further progress

Some progress :)
9.4.2.2 Simple element (xs:string or xs:hexBinary)
Required occurrence: If the element has a default value then an item is 
added to the infoset using the default value, otherwise an item is added 
to the Infoset using empty string (type xs:string) or empty hexBinary 
(type xs:hexBinary) as the value. 
Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then 
an item is added to the Infoset using empty string (type xs:string) or 
empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is 
added to the Infoset. 

IBM DFDL behaviour:

Required. IBM DFDL does not implement default values when parsing, so an 
empty occurrence with a default value gives an SDE (to prevent 
backtracking). An empty occurrence with no default gives a Processing 
Error. If you need to add an empty string to the infoset, you can add 
default=""(when default values implemented, of course).

Optional. IBM DFDL adds nothing to the infoset regardless of presence of 
initiator and/or terminator. No way to get empty string into the infoset.


9.4.2.3 Complex element 
Required occurrence: An item is added to the Infoset. 
Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then 
an item is added to the Infoset, otherwise nothing is added to the 
Infoset. 
For both required and optional occurrences, the Infoset item may also have 
a child item. 
 1.     If the first child element of the complex type is a required 
simple element, then an empty string (type xs:string), empty hexBinary 
(type xs:hexBinary), or default value will also be added to the Infoset. 
 2.     If the first child element of the complex type is a required 
complex element, then an item is added to the Infoset (which may itself 
have a child via (1))

IBM DFDL behaviour:

Required. IBM DFDL follows the spec (modulo 1 when an error would have 
been thrown, as per its 9.4.2.2 behaviour).

Optional. IBM DFDL follows the spec (modulo 1 when an error would have 
been thrown, as per its 9.4.2.2 behaviour).


So ...

The spec today is consistent in one way, in that for both complex & string 
elements a) a required empty occurrence always adds to the infoset; & b) 
an optional empty occurrence adds to the infoset if initiator/terminator 
present; & c) an optional empty occurrence does not add to the infoset if 
no initiator/terminator present.

If the simple string behaviour was to change to match IBM DFDL then that 
consistency is lost, but the string behaviour then matches that for other 
simple types.  Section 9.4.2.2 disappears as the behaviour is same as 
9.4.2.1. Section 9.4.2.3 becomes as below. We lose the ability to get an 
empty string into the infoset for an optional string with 
initiator/terminator.

9.4.2.3 Complex element 
Required occurrence: An item is added to the Infoset. 
Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then 
an item is added to the Infoset, otherwise nothing is added to the 
Infoset. 
For both required and optional occurrences, the Infoset item may also have 
a child item. 
 1.     If the first child element of the complex type is a required 
simple element, then a default value will also be added to the Infoset. 
 2.     If the first child element of the complex type is a required 
complex element, then an item is added to the Infoset (which may itself 
have a child via (1))

We also need to be sure that any other implementations have not yet 
implemented the current spec behaviour.  Need to check with DFDL4S and IBM 
TPF.

To be discussed on next WG call ...

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190403/7ec7dfa7/attachment.html>


More information about the dfdl-wg mailing list