[DFDL-WG] Action 306 - IBM DFDL behaviour when parsing empty strings

Mike Beckerle mbeckerle.dfdl at gmail.com
Sun Apr 28 09:36:40 EDT 2019


One clarification: is the IBM DFDL behavior the same for empty hexBinary
elements as it is for text strings?

I'm going to suggest we need a policy property e.g.,

dfdl:emptyElementPolicy which is an enum with at least these options:

noOptionalEmptyElements  - matches current IBM DFDL behavior
optionalEmptyElementsWithSyntax - matches current description in the DFDL
spec where initiator and/or terminator found triggers creation of an empty
string value. (Daffodil implements this.)

This would apply (I think) to both types xs:string ad xs:hexBinary

I'm open to suggestions for better naming for the property and the property
values, but these are the two settings we need I think.

I do believe that the latter optionalEmptyElementsWithSyntax behavior is
what the DFDL spec describes, and is most consistent given the available
properties such as emptyValueDelimiterPolicy.

We can make implementation of optionalEmptyElementsWithSyntax a DFDL
optional language feature, thereby avoiding issues of conformance with the
DFDL standard.


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Fri, Apr 5, 2019 at 12:43 PM Steve Hanson <smh at uk.ibm.com> wrote:

> Daffodil to perform identical tests but the belief is that they implement
> the spec as published (except maybe for one bug with default values for
> strings).
>
> So there is a mis-match between Daffodil and IBM DFDL.  It sounds like a
> new property is going to be needed which toggles the way that empty strings
> are handled.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Steve Hanson/UK/IBM
> To:        DFDL-WG <dfdl-wg at ogf.org>
> Cc:        "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
> michele.zundo at esa.int>, Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
> Date:        03/04/2019 12:04
> Subject:        Action 306 - IBM DFDL behaviour when parsing empty strings
> ------------------------------
>
>
> *306*
> *Confirm IBM DFDL behaviour when parsing empty strings (Steve)*
> 7/8: IBM DFDL has not fully implemented the behaviour changes arising from
> action 140 with respect to empty string elements. Daffodil is about to do
> so. IBM DFDL users have complained about lack of defaults when parsing but
> other than that appear happy. Are the rules in the spec for empty strings
> over complicated?  Steve to document the behaviour for IBM DFDL to inform
> the discussion.
> ...
> 1/11: In progress - there are a lot of subtle scenarios
> 15/11: Not discussed
> ...
> 7/2/19: No further progress
>
> Some progress :)
>
> *9.4.2.2        Simple element (xs:string or xs:hexBinary)*
>
> *Required occurrence: If the element has a default value then an item is
> added to the infoset using the default value, otherwise an item is added to
> the Infoset using empty string (type xs:string) or empty hexBinary (type
> xs:hexBinary) as the value. *
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then
> an item is added to the Infoset using empty string (type xs:string) or
> empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is
> added to the Infoset. *
>
>
> *IBM DFDL behaviour:*
>
> Required. IBM DFDL does not implement default values when parsing, so an
> empty occurrence with a default value gives an SDE (to prevent
> backtracking). An empty occurrence with no default gives a Processing
> Error. If you need to add an empty string to the infoset, you can add
> *default=""*(when default values implemented, of course).
>
> Optional. IBM DFDL adds nothing to the infoset regardless of presence of
> initiator and/or terminator. No way to get empty string into the infoset.
>
> *9.4.2.3        Complex element *
>
> *Required occurrence: An item is added to the Infoset. *
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then
> an item is added to the Infoset, otherwise nothing is added to the Infoset.
> *
>
> *For both required and optional occurrences, the Infoset item may also
> have a child item. *
>
> * 1.        If the first child element of the complex type is a required
> simple element, then an empty string (type xs:string), empty hexBinary
> (type xs:hexBinary), or default value will also be added to the Infoset. *
>
> * 2.        If the first child element of the complex type is a required
> complex element, then an item is added to the Infoset (which may itself
> have a child via (1))*
>
>
> *IBM DFDL behaviour:*
>
> Required. IBM DFDL follows the spec (modulo 1 when an error would have
> been thrown, as per its 9.4.2.2 behaviour).
>
> Optional. IBM DFDL follows the spec (modulo 1 when an error would have
> been thrown, as per its 9.4.2.2 behaviour).
>
>
> *So ...*
>
> The spec today is consistent in one way, in that for both complex & string
> elements a) a required empty occurrence always adds to the infoset; & b) an
> optional empty occurrence adds to the infoset if initiator/terminator
> present; & c) an optional empty occurrence does not add to the infoset if
> no initiator/terminator present.
>
> If the simple string behaviour was to change to match IBM DFDL then that
> consistency is lost, *but* the string behaviour then matches that for
> other simple types.  Section 9.4.2.2 disappears as the behaviour is same as
> 9.4.2.1. Section 9.4.2.3 becomes as below. We lose the ability to get an
> empty string into the infoset for an optional string with
> initiator/terminator.
>
> *9.4.2.3        Complex element *
>
> *Required occurrence: An item is added to the Infoset. *
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then
> an item is added to the Infoset, otherwise nothing is added to the Infoset.
> *
>
> *For both required and optional occurrences, the Infoset item may also
> have a child item. *
>
> * 1.        If the first child element of the complex type is a required
> simple element, then a default value will also be added to the Infoset. *
>
> * 2.        If the first child element of the complex type is a required
> complex element, then an item is added to the Infoset (which may itself
> have a child via (1))*
>
>
> We also need to be sure that any other implementations have not yet
> implemented the current spec behaviour.  Need to check with *DFDL4S *and *IBM
> TPF.*
>
> To be discussed on next WG call ...
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190428/76a291dd/attachment.html>


More information about the dfdl-wg mailing list