[DFDL-WG] Clarification needed: separator for empty sequence

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Jul 19 13:37:00 EDT 2018


I believe Daffodil has an incorrect behavior in the way it treats
separators today, but I want to clarify things relative to the DFDL spec
before fixing it.

Consider this element:

    <xs:element name="NS_13">
      <xs:complexType>
        <xs:sequence dfdl:separator=","  dfdl:separatorPosition="infix"
              dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
            <xs:sequence>
              <xs:annotation>
              <xs:appinfo source="http://www.ogf.org/dfdl/">
                <dfdl:assert>{fn:true()}</dfdl:assert>
              </xs:appinfo>
              </xs:annotation>
          </xs:sequence>
          <xs:element name="e2" type="xs:int" minOccurs="1"
maxOccurs="unbounded" dfdl:textNumberPattern="#####"
               dfdl:occursCountKind="implicit" />
        </xs:sequence>
      </xs:complexType>
    </xs:element>

The outermost sequence has infix separator.

It's content begins with another sequence, but this sequence is empty,
having only assert statements in it.

Question: does this empty sequence cause a separator to be inserted after
it?

E.g., Should this data parse ",1,2,3" and is that initial comma required?

An argument can be made that such empty sequences can be detected by a DFDL
implementation and treated as having "no representation" akin to how an
element with dfdl:inputValueCalc is treated. Such elements are invisible as
far as the representation is concerned. They cause no separators in
sequences, no alignment regions, etc.

In that case the data would not require nor accept the initial comma.

But the DFDL spec is not clear on whether empty sequences are treated in
this way, so I am assuming they are not treated specially, so the comma is
required because all model groups are treated as required regardless of
whether they have zero-length representations.

Is that correct? If so then daffodil has a bug, because it does NOT put a
separator in for this today.

What if the empty sequence carries a dfdl:initiator="A" annotation?

Such a sequence is still empty in the XSDL sense of empty sequence, but
clearly has a representation that is not zero length. In that case I think
the data has to be "A,1,2,3" so that there is a separator after the empty
sequence's non-ZL representation. I think this is not controversial.

Other variations:
What if the empty sequence contains elements with dfdl:inputValueCalc only.
So there is a content model, but it is all non-represented elements. Would
it still be a "DFDL empty sequence" or "DFDL non-represented sequence"?

What if the sequence isn't empty, but contains only "optional" elements. In
that case, is the whole sequence "optional" and so the comma becomes
sensitive to the dfdl:separatorSuppressionPolicy?
I guess this means a model group inherits the optionality/required-ness of
its contents, unless it has its own required framing.

I believe the simplest thing is to require the comma here. This is,
however, a backward incompatible change to Daffodil to conform, so I want
to be sure this is correct.

The element declaration can be rewritten so that the empty sequence comes
before the separated sequence. Presumably the reason someone would insert
an empty sequence like this is to get the assert statement executed at the
beginning of the sequence, not afterwards. However, if the sequence has
separators, then you can't just insert an initial empty sequence to carry
that assert without requiring a separator.

The element can be rewritten:

    <xs:element name="NS_13">
      <xs:complexType>
        <xs:sequence >
            <xs:sequence>
              <xs:annotation>
              <xs:appinfo source="http://www.ogf.org/dfdl/">
                <dfdl:assert>{fn:true()}</dfdl:assert>
              </xs:appinfo>
              </xs:annotation>
          </xs:sequence>
          <xs:sequence dfdl:separator=","  dfdl:separatorPosition="infix"
              dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
            <xs:element name="e2" type="xs:int" maxOccurs="unbounded"
dfdl:textNumberPattern="#####" />
          </xs:sequence>
       </xs:sequence>
     </xs:complexType>
    </xs:element>

This achieves the desired behavior where the assertion executes first, but
does not cause a separator to be needed, and does so regardless of the
treatment of empty sequences.

Ultimately, we need a clarification that states either:

1) empty sequences are considered represented terms that are required, and
hence, they imply framing such as alignment and presence of separators in
separated sequences even if they have zero length.

or

2) Introduce concept "DFDL non-represented sequence". These are sequences
that have no framing, no delimiters, and have empty content model, or only
elements with dfdl:inputValueCalc or other DFDL non-represented sequences
in their content model (recursively). They have no representation, so imply
no alignment, no need of separators, etc. A group ref (hidden or not) to a
DFDL non-represented sequence is also a DFDL non-represented sequence.

Note that these could be generalized to choice groups also.

Comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180719/b1981a23/attachment-0001.html>


More information about the dfdl-wg mailing list