[DFDL-WG] Action 306 - IBM DFDL behaviour when parsing empty strings

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri May 3 16:22:42 EDT 2019


Under testing with the EDIFACT schema (from DFDLSchemas on github) against
new code in daffodil,  I see that my proposal was not sufficient.
Steve Hanson stated that IBM DFDL current behavior for required empty
strings includes "An empty occurrence with no default gives a Processing
Error."

I misinterpreted this. I was thinking required occurrence of an array
element (as in with index <= minOccurs). But this should not be interpreted
that narrowly, but any required occurrence at all including scalar
elements. The EDIFACT schema depends on this behavior and backtracking
driven by it, in order to work.

So my suggestion for new properties to control this is revised to:

dfdl:emptyElementPolicy enum with values

noEmptyElements  - matches current IBM DFDL behavior where
* required elements without default values that are empty (specifically
which satisfy the empty syntax - defined below) always cause Processing
Errors.
** If a default value is specified that is provided as the value instead.
When a default value is specified, then implementations that don't support
default values when parsing must issue a runtime SDE here, not a processing
error.
* optional elements which satisfy the empty syntax are not added to the
infoset. Defaulting is never considered.

emptyElements - matches current description in the DFDL spec where
* required elements:  if the string/hexBinary satisfies the empty syntax
then required elements are created with an empty string or empty hexBinary
as their value. If a default value is specified that is substituted as the
value instead. When a default value is specified, then implementations that
don't support default values when parsing must issue a SDE here, not a
processing error.
* optional elements: if the string/hexBinary satisfies the empty syntax,
and emptyValueDelimiterPolicy is not 'none' then an empty string (or
hexbinary) is added to the infoset. If emptyValueDelimiterPolicy is 'none',
nothing is added to the infoset.

The term "satisfy the empty syntax" means what is found in the data stream
may require initiator and/or terminator depending on
emptyValueDelimiterPolicy, but if that is 'none' then this is satisfied
just by empty string (or no bytes for hexBinary).

Having said the above, I believe we also have to consider nillable elements.

There are two topics:

1) defaulting to nilled - For the case of a nillable element, where the
data syntax does NOT match the nil representation, then in the above
anywhere a default value is specified, and there is behavior associated
with that, well if the element is nillable, and dfdl:useNilAsDefault='true'
is specified,  then the element is default valued to being nilled. When
nillable and dfdl:useNilAsDefault='true' is specified,  then
implementations that don't support defaulting to nilled when parsing must
issue an SDE here, not a processing error.

That takes care of the defaulting aspect of nillables.

The second topic is:

2) nillable, and dfdl:nilValue contains %ES; as one of the possible nil
representations. Hence, there is the possibility of empty string (or empty
hexBinary) matching the nil representation.

I think the DFDL spec is clear here that if the data stream satisfies the
nil syntax, then required or optional, you get a nilled element, period.

Does IBM DFDL implement that behavior?  If so great. If not I think we may
have to amend the above description of noEmptyElements case for
dfdl:emptyElementPolicy to specify the special cases.

...mikeb

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Sun, Apr 28, 2019 at 9:36 AM Mike Beckerle <mbeckerle.dfdl at gmail.com>
wrote:

> One clarification: is the IBM DFDL behavior the same for empty hexBinary
> elements as it is for text strings?
>
> I'm going to suggest we need a policy property e.g.,
>
> dfdl:emptyElementPolicy which is an enum with at least these options:
>
> noOptionalEmptyElements  - matches current IBM DFDL behavior
> optionalEmptyElementsWithSyntax - matches current description in the DFDL
> spec where initiator and/or terminator found triggers creation of an empty
> string value. (Daffodil implements this.)
>
> This would apply (I think) to both types xs:string ad xs:hexBinary
>
> I'm open to suggestions for better naming for the property and the
> property values, but these are the two settings we need I think.
>
> I do believe that the latter optionalEmptyElementsWithSyntax behavior is
> what the DFDL spec describes, and is most consistent given the available
> properties such as emptyValueDelimiterPolicy.
>
> We can make implementation of optionalEmptyElementsWithSyntax a DFDL
> optional language feature, thereby avoiding issues of conformance with the
> DFDL standard.
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the OGF Intellectual Property Policy
> <http://www.ogf.org/About/abt_policies.php>
>
>
>
> On Fri, Apr 5, 2019 at 12:43 PM Steve Hanson <smh at uk.ibm.com> wrote:
>
>> Daffodil to perform identical tests but the belief is that they implement
>> the spec as published (except maybe for one bug with default values for
>> strings).
>>
>> So there is a mis-match between Daffodil and IBM DFDL.  It sounds like a
>> new property is going to be needed which toggles the way that empty strings
>> are handled.
>>
>> Regards
>>
>> Steve Hanson
>>
>> IBM Hybrid Integration, Hursley, UK
>> Architect, *IBM DFDL*
>> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> *smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>> mob:+44-7717-378890
>> Note: I work Tuesday to Friday
>>
>>
>>
>> From:        Steve Hanson/UK/IBM
>> To:        DFDL-WG <dfdl-wg at ogf.org>
>> Cc:        "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
>> michele.zundo at esa.int>, Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
>> Date:        03/04/2019 12:04
>> Subject:        Action 306 - IBM DFDL behaviour when parsing empty
>> strings
>> ------------------------------
>>
>>
>> *306*
>> *Confirm IBM DFDL behaviour when parsing empty strings (Steve)*
>> 7/8: IBM DFDL has not fully implemented the behaviour changes arising
>> from action 140 with respect to empty string elements. Daffodil is about to
>> do so. IBM DFDL users have complained about lack of defaults when parsing
>> but other than that appear happy. Are the rules in the spec for empty
>> strings over complicated?  Steve to document the behaviour for IBM DFDL to
>> inform the discussion.
>> ...
>> 1/11: In progress - there are a lot of subtle scenarios
>> 15/11: Not discussed
>> ...
>> 7/2/19: No further progress
>>
>> Some progress :)
>>
>> *9.4.2.2        Simple element (xs:string or xs:hexBinary)*
>>
>> *Required occurrence: If the element has a default value then an item is
>> added to the infoset using the default value, otherwise an item is added to
>> the Infoset using empty string (type xs:string) or empty hexBinary (type
>> xs:hexBinary) as the value. *
>>
>> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'
>> then an item is added to the Infoset using empty string (type xs:string) or
>> empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is
>> added to the Infoset. *
>>
>>
>> *IBM DFDL behaviour:*
>>
>> Required. IBM DFDL does not implement default values when parsing, so an
>> empty occurrence with a default value gives an SDE (to prevent
>> backtracking). An empty occurrence with no default gives a Processing
>> Error. If you need to add an empty string to the infoset, you can add
>> *default=""*(when default values implemented, of course).
>>
>> Optional. IBM DFDL adds nothing to the infoset regardless of presence of
>> initiator and/or terminator. No way to get empty string into the infoset.
>>
>> *9.4.2.3        Complex element *
>>
>> *Required occurrence: An item is added to the Infoset. *
>>
>> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'
>> then an item is added to the Infoset, otherwise nothing is added to the
>> Infoset. *
>>
>> *For both required and optional occurrences, the Infoset item may also
>> have a child item. *
>>
>> * 1.        If the first child element of the complex type is a required
>> simple element, then an empty string (type xs:string), empty hexBinary
>> (type xs:hexBinary), or default value will also be added to the Infoset. *
>>
>> * 2.        If the first child element of the complex type is a required
>> complex element, then an item is added to the Infoset (which may itself
>> have a child via (1))*
>>
>>
>> *IBM DFDL behaviour:*
>>
>> Required. IBM DFDL follows the spec (modulo 1 when an error would have
>> been thrown, as per its 9.4.2.2 behaviour).
>>
>> Optional. IBM DFDL follows the spec (modulo 1 when an error would have
>> been thrown, as per its 9.4.2.2 behaviour).
>>
>>
>> *So ...*
>>
>> The spec today is consistent in one way, in that for both complex &
>> string elements a) a required empty occurrence always adds to the infoset;
>> & b) an optional empty occurrence adds to the infoset if
>> initiator/terminator present; & c) an optional empty occurrence does not
>> add to the infoset if no initiator/terminator present.
>>
>> If the simple string behaviour was to change to match IBM DFDL then that
>> consistency is lost, *but* the string behaviour then matches that for
>> other simple types.  Section 9.4.2.2 disappears as the behaviour is same as
>> 9.4.2.1. Section 9.4.2.3 becomes as below. We lose the ability to get an
>> empty string into the infoset for an optional string with
>> initiator/terminator.
>>
>> *9.4.2.3        Complex element *
>>
>> *Required occurrence: An item is added to the Infoset. *
>>
>> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'
>> then an item is added to the Infoset, otherwise nothing is added to the
>> Infoset. *
>>
>> *For both required and optional occurrences, the Infoset item may also
>> have a child item. *
>>
>> * 1.        If the first child element of the complex type is a required
>> simple element, then a default value will also be added to the Infoset. *
>>
>> * 2.        If the first child element of the complex type is a required
>> complex element, then an item is added to the Infoset (which may itself
>> have a child via (1))*
>>
>>
>> We also need to be sure that any other implementations have not yet
>> implemented the current spec behaviour.  Need to check with *DFDL4S *and *IBM
>> TPF.*
>>
>> To be discussed on next WG call ...
>>
>> Regards
>>
>> Steve Hanson
>>
>> IBM Hybrid Integration, Hursley, UK
>> Architect, *IBM DFDL*
>> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> *smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>> mob:+44-7717-378890
>> Note: I work Tuesday to Friday
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>> --
>>   dfdl-wg mailing list
>>   dfdl-wg at ogf.org
>>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190503/367f72bd/attachment-0001.html>


More information about the dfdl-wg mailing list