[DFDL-WG] Clarification needed: separator for empty sequence

Steve Hanson smh at uk.ibm.com
Thu Aug 2 12:21:56 EDT 2018


Yes we can add some clarification, although by implication any child that 
does not carry inputValueCalc will give rise to a separator.

Same therefore true for empty choice, though a choice with no branches is 
a schema definition error, as is a choice branch that carries 
inputValueCalc, so it's clearer than the empty sequence case. 

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson <smh at uk.ibm.com>
Cc:     dev at daffodil.apache.org, dfdl-wg at ogf.org, dfdl-wg 
<dfdl-wg-bounces at ogf.org>
Date:   01/08/2018 14:57
Subject:        Re: [DFDL-WG] Clarification needed: separator for empty 
sequence



Added Daffodil bug https://issues.apache.org/jira/browse/DAFFODIL-1975. 

I think we should add a very positive statement like "All sequences, 
including empty sequences are considered represented terms that are 
required, and hence, they imply framing such as alignment and presence of 
separators in separated sequences even if they have zero length. Separator 
suppression based on dfdl:separatorSuppressionPolicy does not apply."

Does this go for empty choices as well? I think it should. 


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Wed, Aug 1, 2018 at 8:35 AM, Steve Hanson <smh at uk.ibm.com> wrote:
Any sequence, empty or otherwise, which is a child of an outer sequence 
follows the separator rules for the outer sequence. 

The empty sequence in your example will require a separator when parsing, 
and cause a separator when unparsing. 

Tested with IBM DFDL and that's what we have implemented. Missing a 
leading separator (eg a,b,c,d) gives a processing error. 

I think that section 14.1 implies that, but you can add extra words to 
clarify if you like. 

We have not encountered the need for the concept of a DFDL non-represented 
sequence. 

Regards
 
Steve Hanson 
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        dfdl-wg at ogf.org 
Cc:        dev at daffodil.apache.org 
Date:        19/07/2018 18:37 
Subject:        [DFDL-WG] Clarification needed: separator for empty 
sequence 
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org> 



I believe Daffodil has an incorrect behavior in the way it treats 
separators today, but I want to clarify things relative to the DFDL spec 
before fixing it. 

Consider this element: 

    <xs:element name="NS_13">
      <xs:complexType>
        <xs:sequence dfdl:separator=","  dfdl:separatorPosition="infix" 
              dfdl:separatorSuppressionPolicy="trailingEmptyStrict"> 
            <xs:sequence>
              <xs:annotation>
              <xs:appinfo source="http://www.ogf.org/dfdl/">
                <dfdl:assert>{fn:true()}</dfdl:assert>
              </xs:appinfo>
              </xs:annotation>
          </xs:sequence>
          <xs:element name="e2" type="xs:int" minOccurs="1" 
maxOccurs="unbounded" dfdl:textNumberPattern="#####" 
               dfdl:occursCountKind="implicit" /> 
        </xs:sequence>
      </xs:complexType>
    </xs:element> 

The outermost sequence has infix separator. 

It's content begins with another sequence, but this sequence is empty, 
having only assert statements in it. 

Question: does this empty sequence cause a separator to be inserted after 
it? 

E.g., Should this data parse ",1,2,3" and is that initial comma required? 

An argument can be made that such empty sequences can be detected by a 
DFDL implementation and treated as having "no representation" akin to how 
an element with dfdl:inputValueCalc is treated. Such elements are 
invisible as far as the representation is concerned. They cause no 
separators in sequences, no alignment regions, etc. 

In that case the data would not require nor accept the initial comma. 

But the DFDL spec is not clear on whether empty sequences are treated in 
this way, so I am assuming they are not treated specially, so the comma is 
required because all model groups are treated as required regardless of 
whether they have zero-length representations. 

Is that correct? If so then daffodil has a bug, because it does NOT put a 
separator in for this today. 

What if the empty sequence carries a dfdl:initiator="A" annotation? 

Such a sequence is still empty in the XSDL sense of empty sequence, but 
clearly has a representation that is not zero length. In that case I think 
the data has to be "A,1,2,3" so that there is a separator after the empty 
sequence's non-ZL representation. I think this is not controversial. 

Other variations: 
What if the empty sequence contains elements with dfdl:inputValueCalc 
only. So there is a content model, but it is all non-represented elements. 
Would it still be a "DFDL empty sequence" or "DFDL non-represented 
sequence"? 

What if the sequence isn't empty, but contains only "optional" elements. 
In that case, is the whole sequence "optional" and so the comma becomes 
sensitive to the dfdl:separatorSuppressionPolicy? 
I guess this means a model group inherits the optionality/required-ness of 
its contents, unless it has its own required framing. 

I believe the simplest thing is to require the comma here. This is, 
however, a backward incompatible change to Daffodil to conform, so I want 
to be sure this is correct. 

The element declaration can be rewritten so that the empty sequence comes 
before the separated sequence. Presumably the reason someone would insert 
an empty sequence like this is to get the assert statement executed at the 
beginning of the sequence, not afterwards. However, if the sequence has 
separators, then you can't just insert an initial empty sequence to carry 
that assert without requiring a separator. 

The element can be rewritten: 

    <xs:element name="NS_13">
      <xs:complexType>
        <xs:sequence > 
            <xs:sequence>
              <xs:annotation>
              <xs:appinfo source="http://www.ogf.org/dfdl/">
                <dfdl:assert>{fn:true()}</dfdl:assert>
              </xs:appinfo>
              </xs:annotation>
          </xs:sequence> 
          <xs:sequence dfdl:separator=","  dfdl:separatorPosition="infix" 
              dfdl:separatorSuppressionPolicy="trailingEmptyStrict"> 
            <xs:element name="e2" type="xs:int" maxOccurs="unbounded" 
dfdl:textNumberPattern="#####" /> 
          </xs:sequence> 
       </xs:sequence>
     </xs:complexType>
    </xs:element> 

This achieves the desired behavior where the assertion executes first, but 
does not cause a separator to be needed, and does so regardless of the 
treatment of empty sequences. 

Ultimately, we need a clarification that states either: 

1) empty sequences are considered represented terms that are required, and 
hence, they imply framing such as alignment and presence of separators in 
separated sequences even if they have zero length. 

or 

2) Introduce concept "DFDL non-represented sequence". These are sequences 
that have no framing, no delimiters, and have empty content model, or only 
elements with dfdl:inputValueCalc or other DFDL non-represented sequences 
in their content model (recursively). They have no representation, so 
imply no alignment, no need of separators, etc. A group ref (hidden or 
not) to a DFDL non-represented sequence is also a DFDL non-represented 
sequence.  

Note that these could be generalized to choice groups also. 

Comments? 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com 
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy 
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/88f30531/attachment.html>


More information about the dfdl-wg mailing list