[DFDL-WG] clarification on choice unparsing and empty sequences

Steve Hanson smh at uk.ibm.com
Tue Aug 11 10:26:14 EDT 2015


I did some testing of IBM DFDL using Mike's example below ...

<choice>
   <sequence dfdl:terminator=";"/>
   <sequence dfdl:terminator="/"/>
  <element name="foo" dfdl:length="5" />
</choice>
<element name="bar" dfdl:length="5"/>

IBM DFDL parses all of the below ...

;bbbbb
/bbbbb
fffffbbbbb

But it fails when serializing the infosets from the first two of the above 
...

CTDU4035E: The DFDL serializer cannot assign a default value to a choice 
group. 

IBM DFDL has not implemented the 'defaulting an empty choice' algorithm 
yet. But if it did then I would expect it to create ;bbbbb for both of the 
first two.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   22/04/2015 19:15
Subject:        [DFDL-WG] clarification on choice unparsing and empty 
sequences
Sent by:        dfdl-wg-bounces at ogf.org




The spec language around how choice branches are resolved when unparsing 
suggests that each branch must have an element in it somewhere.


On unparsing there is the question of how one identifies the appropriate 
schema choice branch 
corresponding to the data in the infoset. This is complicated by the fact 
that the children may not 
be elements. They may themselves be sequences or choices.The selection of 
the choice branch 
is as follows: The element in the infoset is used to search the choice 
branches in the schema, in 
schema definition order, but without looking inside any complex elements. 
If the element occurs 
in a branch, then that branch is selected and if subsequently a processing 
error occurs, this 
selection is not revisited (that is, there is no backtracking). 
To avoid any unintended behavior, all the children of a choice can be 
modeled as elements.

However,  that passage of the spec seems incomplete now to me.  Though at 
the time it was written and reviewed it seemed to be the solution to the 
issue. It does imply that each model group in a choice branch has to have 
an element somewhere, but I couldn't find that statement explicitly in the 
spec, and the spec does say empty model groups are specifically allowed.

14.1     Empty Sequences
A sequence having no children is syntactically legal in DFDL. In the data 
stream, such a sequence can have non-zero length LeftFraming and 
RightFraming regions, but the SequenceContent region in between must be 
empty. It is a processing error if the SequenceContent region of an empty 
sequence has non-zero length when parsing.

This leaves open the issue of hidden groups e.g., <sequence 
dfdl:hiddenGroupRef="mygroup"/> is an empty sequence? Or does the hidden 
group count as if there really are children of this sequence? I suspect 
the latter, but need to see how others have interpreted this.

XML schema does not define an empty sequence that is the content model of 
a complex type definition as effective content so any DFDL annotations on 
such a construct would be ignored. It is a schema definition error if the 
empty sequence is the content model of a complex type, or if a complex 
type has nothing in its content model at all.

That makes clear that both these are SDE:

<complexType><sequence/></complexType>

<complexType></complexType>

But it leaves many scenarios unclear still.

Consider this schema fragment:

....
<choice>
  <sequence dfdl:terminator=";"/>
  <element name="foo"/>
</choice>
<element name="bar"/>
....

If we are unparsing and the infoset is just <bar> then we can compute that 
finding <bar> in the infoset means the first branch is selected. So we 
would unparse that and output a ";".

However, if there is true ambiguity like:

<choice>
   <sequence dfdl:terminator=";"/>
   <sequence dfdl:terminator="/"/>
  <element name="foo"/>
</choice>
<element name="bar"/>

That's effectively saying on parse either a ";" or a "/" may be found. On 
unparse, there is nothing to guide which choice branch other than telling 
us if <bar> is found, that the <foo> element branch is NOT selected. 
However, it is ok to just output the first always (;) if the infoset has a 
<bar> element.

However, I wanted to check and see what others interpretation of the spec 
is for this issue.



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150811/58f7c5cc/attachment-0001.html>


More information about the dfdl-wg mailing list