[DFDL-WG] clarification needed - ambiguity about empty string and optional element

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Aug 2 12:28:38 EDT 2018


I concur with all of this.  In the example the element 'y' has no syntax in
the data, and so should not be created for an optional index position.
I will report a Daffodil bug about this.

The wording change to 9.4.2.2 to just insert "is applicable and" fixes a
bunch of problems.

Same wording change should appear in 9.4.2.3.



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Thu, Aug 2, 2018 at 12:08 PM, Steve Hanson <smh at uk.ibm.com> wrote:

> First thing to note is that 'anyEmpty' means the sequence is
> non-positional, and in such a sequence I would expect initiators to be
> defined.
>
> EmptyValueDelimiterPolicy not relevant as no initiator or terminator.
>
> "Since the 'y' element decl does not specify a XSD default value, the
> concept of 'empty' and defaulting doesn't apply here". Not correct. The
> concept of empty applies; defaulting happens if empty & required & default
> set.
>
> For your "foo;" example, the infoset should not contain </y> because y is
> optional & empty & does not have initiator (spec 9.4.2.2):
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then
> an item is added to the Infoset using empty string (type xs:string) or
> empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is
> added to the Infoset. *
>
>
> I think that the sentence can be clarified to say:
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is **applicable
> and **not 'none' then an item is added to the Infoset using empty string
> (type xs:string) or empty hexBinary (type xs:hexBinary) as the value,
> otherwise nothing is added to the Infoset. *
>
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        dfdl-wg at ogf.org
> Date:        01/08/2018 19:42
> Subject:        Re: [DFDL-WG] clarification needed - ambiguity about
> empty string        and optional element
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> I omitted that dfdl:emptyValueDelimiterPolicy is 'both' here, though no
> dfdl:initiator nor dfdl:terminator are defined.
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
> On Wed, Jul 11, 2018 at 8:16 AM, Mike Beckerle <*mbeckerle.dfdl at gmail.com*
> <mbeckerle.dfdl at gmail.com>> wrote:
> Consider this data of 4 characters:
>
> foo;
>
> Consider this schema where the default format is the basic general set of
> text-oriented defaults.
>
> <xs:element name="ex_infix" dfdl:lengthKind="implicit">
>   <xs:complexType>
>     <xs:sequence dfdl:separator=";" dfdl:separatorSuppressionPolicy="anyEmpty"
> dfdl:separatorPosition="infix">
>        <xs:element name="x" type="xs:string" dfdl:lengthKind="delimited"/>
>        <xs:element name="y" type="xs:string" minOccurs="0"
>           dfdl:lengthKind="delimited"
>           dfdl:occursCountKind="implicit"/>
>    </xs:sequence>
>  </xs:complexType>
> </xs:element>
>
> This is in a current Daffodil unit test, and produces this infoset:
>
> <ex_infix><x>foo</x><y/></ex_infix>
>
> That is, an empty string element is created for element 'y'.
>
> I'd like to know what IBM DFDL produces as the infoset for this example.
>
> I believe the DFDL spec is actually self-contradictory and so ambiguous
> here about what is the right behavior.
>
>    - DFDL Spec 14.2.1 description of anyEmpty: "...any occurrences that
>    have zero length representation MAY be omitted from the data, along with
>    their associated separator."
>       - Note that it says "may", not "must be". So anyEmpty is "lax" in
>       insisting that the zero-length elements aren't present.
>       - This doesn't clarify anything for us. But it admits the
>       possibility that the ";" separator appears even if the 'y' element
>       occurrence is determined to not exist.
>
>
>    - DFDL Spec 9.3.1.1 says an element is known to exist if it has the
>    nil, empty, or normal representation
>       - In the example, element 'y' is zero-length which is either empty
>       or normal representation since a string can have "" (empty string) as a
>       value.
>       - Since the 'y' element decl does not specify a XSD default value,
>       the concept of 'empty' and defaulting doesn't apply here, so a zero-length
>       string is a normal representation, and according to this section, it is
>       known-to-exist.
>       - This contradicts 9.4.2.2 below.
>
>
>    - DFDL Spec 9.3.1.3 says "Note: based on the above, when processing a
>    sequence for which a separator is defined, the presence of a match in the
>    data for the separator is not sufficient to cause the parser to determine
>    that an associated component is known-to-exist." It then refers you to
>    14.2.1
>       - I don't think this changes anything. Again it just admits that
>       the separator ";" can appear even without the following element. I.e., I
>       think it just allows for lax processing of excess separators.
>
>
>    - DFDL Spec 9.4.2 Element Defaults When Parsing - Subsection
>    9.4.2.2      Simple element (xs:string or xs:hexBinary)  (Emphasis below is
>    mine)
>       - Here's the excerpted text:
>          - "Required occurrence:* If the element has a default value*
>          then an item is added to the infoset using the default value, otherwise an
>          item is added to the Infoset using empty string (type xs:string) or empty
>          hexBinary (type xs:hexBinary) as the value. Optional occurrence: If
>          dfdl:emptyValueDelimiterPolicy is not 'none'*[12]*
>          <http://daffodil.apache.org/docs/dfdl/#_ftn12> then an item is
>          added to the Infoset using empty string (type xs:string) or empty hexBinary
>          (type xs:hexBinary) as the value, *otherwise nothing is added to
>          the Infoset. *
>       Note: *To prevent unwanted empty strings *or empty hexBinary values
>       from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert
>       that uses the dfdl:checkConstraints() function, to raise a processing
>       error."
>       - Note that the language states "if the element has a default
>       value" - which denotes that the section is dealing with both defaultable
>       AND non-defaultable elements, and is not exclusively discussing defaultable
>       elements as the title of 9.4.2 would imply.
>       - The second statement is about optional occurrences, and it does
>       not qualify what it says on defaultable element or not. Hence, I read the
>       "nothing is added to the infoset" as applies whether or not the element is
>       defaultable. So a zero length (ZL) string is never going to create an
>       empty-string value for an optional element.
>       - However, this contradicts the note about preventing unwanted
>       empty strings. That note is only sensible if optional elements of
>       zero-length will get added to the infoset and extra steps are required to
>       force a facet check to prevent them.
>
>
> Unless I'm missing another place in the DFDL spec that clarifies this, I
> think we need to revise this area to make things clearer.
>
> But first we have to pick which is the intended semantics. In the example
> above, which infoset is the one we want:
>
>     <ex_infix><x>foo</x><y/></ex_infix> (empty string as normal
> representation takes priority over optionality)
> or
>     <ex_infix><x>foo</x></ex_infix> (optionality takes priority over
> empty string as normal representation)
>
> Either way I think this change is needed:
>
>    - Section 9.4.2 - change section title to "Element Defaults and
>    Optionality When Parsing"
>
> But a bunch of other clarifications are also needed.
>
> Today Daffodil 2.1.0 implements the first behavior.
> <ex_infix><x>foo</x><y/></ex_infix> with the empty 'y' element.
>
> What does IBM DFDL do?
>
>
>
>
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/90d4a891/attachment-0001.html>


More information about the dfdl-wg mailing list