[DFDL-WG] clarification on choice unparsing and empty sequences

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Apr 22 14:15:14 EDT 2015


The spec language around how choice branches are resolved when unparsing
suggests that each branch must have an element in it somewhere.


On unparsing there is the question of how one identifies the appropriate
schema choice branch
corresponding to the data in the infoset. This is complicated by the fact
that the children may not
be elements. They may themselves be sequences or choices.The selection of
the choice branch
is as follows: The element in the infoset is used to search the choice
branches in the schema, in
schema definition order, but without looking inside any complex elements.
If the element occurs
in a branch, then that branch is selected and if subsequently a processing
error occurs, this
selection is not revisited (that is, there is no backtracking).
To avoid any unintended behavior, all the children of a choice can be
modeled as elements.

However,  that passage of the spec seems incomplete now to me.  Though at
the time it was written and reviewed it seemed to be the solution to the
issue. It does imply that each model group in a choice branch has to have
an element somewhere, but I couldn't find that statement explicitly in the
spec, and the spec does say empty model groups are specifically allowed.

14.1     Empty Sequences

A sequence having no children is syntactically legal in DFDL. In the data
stream, such a sequence can have non-zero length *LeftFraming* and
*RightFraming* regions, but the SequenceContent region in between must be
empty. It is a processing error if the SequenceContent region of an empty
sequence has non-zero length when parsing.


This leaves open the issue of hidden groups e.g., <sequence
dfdl:hiddenGroupRef="mygroup"/> is an empty sequence? Or does the hidden
group count as if there really are children of this sequence? I suspect the
latter, but need to see how others have interpreted this.


XML schema does not define an empty sequence that is the content model of a
complex type definition as effective content so any DFDL annotations on
such a construct would be ignored. It is a schema definition error if the
empty sequence is the content model of a complex type, or if a complex type
has nothing in its content model at all.


That makes clear that both these are SDE:


<complexType><sequence/></complexType>


<complexType></complexType>


But it leaves many scenarios unclear still.

Consider this schema fragment:

....
<choice>
  <sequence dfdl:terminator=";"/>
  <element name="foo"/>
</choice>
<element name="bar"/>
....

If we are unparsing and the infoset is just <bar> then we can compute that
finding <bar> in the infoset means the first branch is selected. So we
would unparse that and output a ";".

However, if there is true ambiguity like:

<choice>
   <sequence dfdl:terminator=";"/>
   <sequence dfdl:terminator="/"/>
  <element name="foo"/>
</choice>
<element name="bar"/>

That's effectively saying on parse either a ";" or a "/" may be found. On
unparse, there is nothing to guide which choice branch other than telling
us if <bar> is found, that the <foo> element branch is NOT selected.
However, it is ok to just output the first always (;) if the infoset has a
<bar> element.

However, I wanted to check and see what others interpretation of the spec
is for this issue.



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150422/d7c9f312/attachment.html>


More information about the dfdl-wg mailing list