[DFDL-WG] clarification needed - ambiguity about empty string and optional element

Steve Hanson smh at uk.ibm.com
Thu Aug 2 12:08:23 EDT 2018


First thing to note is that 'anyEmpty' means the sequence is 
non-positional, and in such a sequence I would expect initiators to be 
defined.

EmptyValueDelimiterPolicy not relevant as no initiator or terminator.

"Since the 'y' element decl does not specify a XSD default value, the 
concept of 'empty' and defaulting doesn't apply here". Not correct. The 
concept of empty applies; defaulting happens if empty & required & default 
set. 

For your "foo;" example, the infoset should not contain </y> because y is 
optional & empty & does not have initiator (spec 9.4.2.2):

Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then 
an item is added to the Infoset using empty string (type xs:string) or 
empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is 
added to the Infoset. 

I think that the sentence can be clarified to say:

Optional occurrence: If dfdl:emptyValueDelimiterPolicy is applicable and 
not 'none' then an item is added to the Infoset using empty string (type 
xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise 
nothing is added to the Infoset. 

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org
Date:   01/08/2018 19:42
Subject:        Re: [DFDL-WG] clarification needed - ambiguity about empty 
string  and optional element
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



I omitted that dfdl:emptyValueDelimiterPolicy is 'both' here, though no 
dfdl:initiator nor dfdl:terminator are defined. 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Wed, Jul 11, 2018 at 8:16 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com> 
wrote:
Consider this data of 4 characters:

foo;

Consider this schema where the default format is the basic general set of 
text-oriented defaults.

<xs:element name="ex_infix" dfdl:lengthKind="implicit">
  <xs:complexType>
    <xs:sequence dfdl:separator=";" 
dfdl:separatorSuppressionPolicy="anyEmpty" dfdl:separatorPosition="infix">
       <xs:element name="x" type="xs:string" dfdl:lengthKind="delimited"/>
       <xs:element name="y" type="xs:string" minOccurs="0" 
          dfdl:lengthKind="delimited"
          dfdl:occursCountKind="implicit"/>
   </xs:sequence>
 </xs:complexType>
</xs:element>
           
This is in a current Daffodil unit test, and produces this infoset:

<ex_infix><x>foo</x><y/></ex_infix>

That is, an empty string element is created for element 'y'. 

I'd like to know what IBM DFDL produces as the infoset for this example. 

I believe the DFDL spec is actually self-contradictory and so ambiguous 
here about what is the right behavior.

DFDL Spec 14.2.1 description of anyEmpty: "...any occurrences that have 
zero length representation MAY be omitted from the data, along with their 
associated separator." 
Note that it says "may", not "must be". So anyEmpty is "lax" in insisting 
that the zero-length elements aren't present.
This doesn't clarify anything for us. But it admits the possibility that 
the ";" separator appears even if the 'y' element occurrence is determined 
to not exist.  

DFDL Spec 9.3.1.1 says an element is known to exist if it has the nil, 
empty, or normal representation
In the example, element 'y' is zero-length which is either empty or normal 
representation since a string can have "" (empty string) as a value. 
Since the 'y' element decl does not specify a XSD default value, the 
concept of 'empty' and defaulting doesn't apply here, so a zero-length 
string is a normal representation, and according to this section, it is 
known-to-exist.  
This contradicts 9.4.2.2 below.

DFDL Spec 9.3.1.3 says "Note: based on the above, when processing a 
sequence for which a separator is defined, the presence of a match in the 
data for the separator is not sufficient to cause the parser to determine 
that an associated component is known-to-exist." It then refers you to 
14.2.1
I don't think this changes anything. Again it just admits that the 
separator ";" can appear even without the following element. I.e., I think 
it just allows for lax processing of excess separators.

DFDL Spec 9.4.2 Element Defaults When Parsing - Subsection 
9.4.2.2      Simple element (xs:string or xs:hexBinary)  (Emphasis below 
is mine)
Here's the excerpted text: 
"Required occurrence: If the element has a default value then an item is 
added to the infoset using the default value, otherwise an item is added 
to the Infoset using empty string (type xs:string) or empty hexBinary 
(type xs:hexBinary) as the value. Optional occurrence: If 
dfdl:emptyValueDelimiterPolicy is not 'none'[12] then an item is added to 
the Infoset using empty string (type xs:string) or empty hexBinary (type 
xs:hexBinary) as the value, otherwise nothing is added to the Infoset. 
Note: To prevent unwanted empty strings or empty hexBinary values from 
being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that 
uses the dfdl:checkConstraints() function, to raise a processing error."
Note that the language states "if the element has a default value" - which 
denotes that the section is dealing with both defaultable AND 
non-defaultable elements, and is not exclusively discussing defaultable 
elements as the title of 9.4.2 would imply.
The second statement is about optional occurrences, and it does not 
qualify what it says on defaultable element or not. Hence, I read the 
"nothing is added to the infoset" as applies whether or not the element is 
defaultable. So a zero length (ZL) string is never going to create an 
empty-string value for an optional element. 
However, this contradicts the note about preventing unwanted empty 
strings. That note is only sensible if optional elements of zero-length 
will get added to the infoset and extra steps are required to force a 
facet check to prevent them. 

Unless I'm missing another place in the DFDL spec that clarifies this, I 
think we need to revise this area to make things clearer.

But first we have to pick which is the intended semantics. In the example 
above, which infoset is the one we want: 

    <ex_infix><x>foo</x><y/></ex_infix> (empty string as normal 
representation takes priority over optionality)
or 
    <ex_infix><x>foo</x></ex_infix> (optionality takes priority over empty 
string as normal representation)

Either way I think this change is needed:
Section 9.4.2 - change section title to "Element Defaults and Optionality 
When Parsing"
But a bunch of other clarifications are also needed. 

Today Daffodil 2.1.0 implements the first behavior. 
<ex_infix><x>foo</x><y/></ex_infix> with the empty 'y' element.

What does IBM DFDL do?









Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/1328e51a/attachment.html>


More information about the dfdl-wg mailing list