[DFDL-WG] clarification needed - ambiguity about empty string and optional element
Mike Beckerle
mbeckerle.dfdl at gmail.com
Thu Aug 2 12:28:38 EDT 2018
I concur with all of this. In the example the element 'y' has no syntax in
the data, and so should not be created for an optional index position.
I will report a Daffodil bug about this.
The wording change to 9.4.2.2 to just insert "is applicable and" fixes a
bunch of problems.
Same wording change should appear in 9.4.2.3.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
On Thu, Aug 2, 2018 at 12:08 PM, Steve Hanson <smh at uk.ibm.com> wrote:
> First thing to note is that 'anyEmpty' means the sequence is
> non-positional, and in such a sequence I would expect initiators to be
> defined.
>
> EmptyValueDelimiterPolicy not relevant as no initiator or terminator.
>
> "Since the 'y' element decl does not specify a XSD default value, the
> concept of 'empty' and defaulting doesn't apply here". Not correct. The
> concept of empty applies; defaulting happens if empty & required & default
> set.
>
> For your "foo;" example, the infoset should not contain </y> because y is
> optional & empty & does not have initiator (spec 9.4.2.2):
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then
> an item is added to the Infoset using empty string (type xs:string) or
> empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is
> added to the Infoset. *
>
>
> I think that the sentence can be clarified to say:
>
> *Optional occurrence: If dfdl:emptyValueDelimiterPolicy is **applicable
> and **not 'none' then an item is added to the Infoset using empty string
> (type xs:string) or empty hexBinary (type xs:hexBinary) as the value,
> otherwise nothing is added to the Infoset. *
>
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To: dfdl-wg at ogf.org
> Date: 01/08/2018 19:42
> Subject: Re: [DFDL-WG] clarification needed - ambiguity about
> empty string and optional element
> Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> I omitted that dfdl:emptyValueDelimiterPolicy is 'both' here, though no
> dfdl:initiator nor dfdl:terminator are defined.
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
> On Wed, Jul 11, 2018 at 8:16 AM, Mike Beckerle <*mbeckerle.dfdl at gmail.com*
> <mbeckerle.dfdl at gmail.com>> wrote:
> Consider this data of 4 characters:
>
> foo;
>
> Consider this schema where the default format is the basic general set of
> text-oriented defaults.
>
> <xs:element name="ex_infix" dfdl:lengthKind="implicit">
> <xs:complexType>
> <xs:sequence dfdl:separator=";" dfdl:separatorSuppressionPolicy="anyEmpty"
> dfdl:separatorPosition="infix">
> <xs:element name="x" type="xs:string" dfdl:lengthKind="delimited"/>
> <xs:element name="y" type="xs:string" minOccurs="0"
> dfdl:lengthKind="delimited"
> dfdl:occursCountKind="implicit"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
>
> This is in a current Daffodil unit test, and produces this infoset:
>
> <ex_infix><x>foo</x><y/></ex_infix>
>
> That is, an empty string element is created for element 'y'.
>
> I'd like to know what IBM DFDL produces as the infoset for this example.
>
> I believe the DFDL spec is actually self-contradictory and so ambiguous
> here about what is the right behavior.
>
> - DFDL Spec 14.2.1 description of anyEmpty: "...any occurrences that
> have zero length representation MAY be omitted from the data, along with
> their associated separator."
> - Note that it says "may", not "must be". So anyEmpty is "lax" in
> insisting that the zero-length elements aren't present.
> - This doesn't clarify anything for us. But it admits the
> possibility that the ";" separator appears even if the 'y' element
> occurrence is determined to not exist.
>
>
> - DFDL Spec 9.3.1.1 says an element is known to exist if it has the
> nil, empty, or normal representation
> - In the example, element 'y' is zero-length which is either empty
> or normal representation since a string can have "" (empty string) as a
> value.
> - Since the 'y' element decl does not specify a XSD default value,
> the concept of 'empty' and defaulting doesn't apply here, so a zero-length
> string is a normal representation, and according to this section, it is
> known-to-exist.
> - This contradicts 9.4.2.2 below.
>
>
> - DFDL Spec 9.3.1.3 says "Note: based on the above, when processing a
> sequence for which a separator is defined, the presence of a match in the
> data for the separator is not sufficient to cause the parser to determine
> that an associated component is known-to-exist." It then refers you to
> 14.2.1
> - I don't think this changes anything. Again it just admits that
> the separator ";" can appear even without the following element. I.e., I
> think it just allows for lax processing of excess separators.
>
>
> - DFDL Spec 9.4.2 Element Defaults When Parsing - Subsection
> 9.4.2.2 Simple element (xs:string or xs:hexBinary) (Emphasis below is
> mine)
> - Here's the excerpted text:
> - "Required occurrence:* If the element has a default value*
> then an item is added to the infoset using the default value, otherwise an
> item is added to the Infoset using empty string (type xs:string) or empty
> hexBinary (type xs:hexBinary) as the value. Optional occurrence: If
> dfdl:emptyValueDelimiterPolicy is not 'none'*[12]*
> <http://daffodil.apache.org/docs/dfdl/#_ftn12> then an item is
> added to the Infoset using empty string (type xs:string) or empty hexBinary
> (type xs:hexBinary) as the value, *otherwise nothing is added to
> the Infoset. *
> Note: *To prevent unwanted empty strings *or empty hexBinary values
> from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert
> that uses the dfdl:checkConstraints() function, to raise a processing
> error."
> - Note that the language states "if the element has a default
> value" - which denotes that the section is dealing with both defaultable
> AND non-defaultable elements, and is not exclusively discussing defaultable
> elements as the title of 9.4.2 would imply.
> - The second statement is about optional occurrences, and it does
> not qualify what it says on defaultable element or not. Hence, I read the
> "nothing is added to the infoset" as applies whether or not the element is
> defaultable. So a zero length (ZL) string is never going to create an
> empty-string value for an optional element.
> - However, this contradicts the note about preventing unwanted
> empty strings. That note is only sensible if optional elements of
> zero-length will get added to the infoset and extra steps are required to
> force a facet check to prevent them.
>
>
> Unless I'm missing another place in the DFDL spec that clarifies this, I
> think we need to revise this area to make things clearer.
>
> But first we have to pick which is the intended semantics. In the example
> above, which infoset is the one we want:
>
> <ex_infix><x>foo</x><y/></ex_infix> (empty string as normal
> representation takes priority over optionality)
> or
> <ex_infix><x>foo</x></ex_infix> (optionality takes priority over
> empty string as normal representation)
>
> Either way I think this change is needed:
>
> - Section 9.4.2 - change section title to "Element Defaults and
> Optionality When Parsing"
>
> But a bunch of other clarifications are also needed.
>
> Today Daffodil 2.1.0 implements the first behavior.
> <ex_infix><x>foo</x><y/></ex_infix> with the empty 'y' element.
>
> What does IBM DFDL do?
>
>
>
>
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
> --
> dfdl-wg mailing list
> dfdl-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/90d4a891/attachment-0001.html>
More information about the dfdl-wg
mailing list