[DFDL-WG] Clarification needed: separator for empty sequence

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Aug 1 09:57:40 EDT 2018


Added Daffodil bug https://issues.apache.org/jira/browse/DAFFODIL-1975.

I think we should add a very positive statement like "All sequences,
including empty sequences are considered represented terms that are
required, and hence, they imply framing such as alignment and presence of
separators in separated sequences even if they have zero length. Separator
suppression based on dfdl:separatorSuppressionPolicy does not apply."

Does this go for empty choices as well? I think it should.


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Wed, Aug 1, 2018 at 8:35 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> Any sequence, empty or otherwise, which is a child of an outer sequence
> follows the separator rules for the outer sequence.
>
> The empty sequence in your example will require a separator when parsing,
> and cause a separator when unparsing.
>
> Tested with IBM DFDL and that's what we have implemented. Missing a
> leading separator (eg a,b,c,d) gives a processing error.
>
> I think that section 14.1 implies that, but you can add extra words to
> clarify if you like.
>
> We have not encountered the need for the concept of a DFDL non-represented
> sequence.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        dfdl-wg at ogf.org
> Cc:        dev at daffodil.apache.org
> Date:        19/07/2018 18:37
> Subject:        [DFDL-WG] Clarification needed: separator for empty
> sequence
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> I believe Daffodil has an incorrect behavior in the way it treats
> separators today, but I want to clarify things relative to the DFDL spec
> before fixing it.
>
> Consider this element:
>
>     <xs:element name="NS_13">
>       <xs:complexType>
>         <xs:sequence dfdl:separator=","  dfdl:separatorPosition="infix"
>               dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
>             <xs:sequence>
>               <xs:annotation>
>               <xs:appinfo source="*http://www.ogf.org/dfdl/*
> <http://www.ogf.org/dfdl/>">
>                 <dfdl:assert>{fn:true()}</dfdl:assert>
>               </xs:appinfo>
>               </xs:annotation>
>           </xs:sequence>
>           <xs:element name="e2" type="xs:int" minOccurs="1"
> maxOccurs="unbounded" dfdl:textNumberPattern="#####"
>                dfdl:occursCountKind="implicit" />
>         </xs:sequence>
>       </xs:complexType>
>     </xs:element>
>
> The outermost sequence has infix separator.
>
> It's content begins with another sequence, but this sequence is empty,
> having only assert statements in it.
>
> Question: does this empty sequence cause a separator to be inserted after
> it?
>
> E.g., Should this data parse ",1,2,3" and is that initial comma required?
>
> An argument can be made that such empty sequences can be detected by a
> DFDL implementation and treated as having "no representation" akin to how
> an element with dfdl:inputValueCalc is treated. Such elements are invisible
> as far as the representation is concerned. They cause no separators in
> sequences, no alignment regions, etc.
>
> In that case the data would not require nor accept the initial comma.
>
> But the DFDL spec is not clear on whether empty sequences are treated in
> this way, so I am assuming they are not treated specially, so the comma is
> required because all model groups are treated as required regardless of
> whether they have zero-length representations.
>
> Is that correct? If so then daffodil has a bug, because it does NOT put a
> separator in for this today.
>
> What if the empty sequence carries a dfdl:initiator="A" annotation?
>
> Such a sequence is still empty in the XSDL sense of empty sequence, but
> clearly has a representation that is not zero length. In that case I think
> the data has to be "A,1,2,3" so that there is a separator after the empty
> sequence's non-ZL representation. I think this is not controversial.
>
> Other variations:
> What if the empty sequence contains elements with dfdl:inputValueCalc
> only. So there is a content model, but it is all non-represented elements.
> Would it still be a "DFDL empty sequence" or "DFDL non-represented
> sequence"?
>
> What if the sequence isn't empty, but contains only "optional" elements.
> In that case, is the whole sequence "optional" and so the comma becomes
> sensitive to the dfdl:separatorSuppressionPolicy?
> I guess this means a model group inherits the optionality/required-ness of
> its contents, unless it has its own required framing.
>
> I believe the simplest thing is to require the comma here. This is,
> however, a backward incompatible change to Daffodil to conform, so I want
> to be sure this is correct.
>
> The element declaration can be rewritten so that the empty sequence comes
> before the separated sequence. Presumably the reason someone would insert
> an empty sequence like this is to get the assert statement executed at the
> beginning of the sequence, not afterwards. However, if the sequence has
> separators, then you can't just insert an initial empty sequence to carry
> that assert without requiring a separator.
>
> The element can be rewritten:
>
>     <xs:element name="NS_13">
>       <xs:complexType>
>         <xs:sequence >
>             <xs:sequence>
>               <xs:annotation>
>               <xs:appinfo source="*http://www.ogf.org/dfdl/*
> <http://www.ogf.org/dfdl/>">
>                 <dfdl:assert>{fn:true()}</dfdl:assert>
>               </xs:appinfo>
>               </xs:annotation>
>           </xs:sequence>
>           <xs:sequence dfdl:separator=","  dfdl:separatorPosition="infix"
>               dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
>             <xs:element name="e2" type="xs:int" maxOccurs="unbounded"
> dfdl:textNumberPattern="#####" />
>           </xs:sequence>
>        </xs:sequence>
>      </xs:complexType>
>     </xs:element>
>
> This achieves the desired behavior where the assertion executes first, but
> does not cause a separator to be needed, and does so regardless of the
> treatment of empty sequences.
>
> Ultimately, we need a clarification that states either:
>
> 1) empty sequences are considered represented terms that are required, and
> hence, they imply framing such as alignment and presence of separators in
> separated sequences even if they have zero length.
>
> or
>
> 2) Introduce concept "DFDL non-represented sequence". These are sequences
> that have no framing, no delimiters, and have empty content model, or only
> elements with dfdl:inputValueCalc or other DFDL non-represented sequences
> in their content model (recursively). They have no representation, so imply
> no alignment, no need of separators, etc. A group ref (hidden or not) to a
> DFDL non-represented sequence is also a DFDL non-represented sequence.
>
> Note that these could be generalized to choice groups also.
>
> Comments?
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180801/16de30c4/attachment-0001.html>


More information about the dfdl-wg mailing list