[DFDL-WG] How to choose the correct choice branch when serializing

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri Apr 13 09:26:46 EDT 2012


Another alternative (d) would be to say the first child element in a choice
alternative must be required, (minOccurs >= 1. I think UPA requires this
already), and cannot have a default value or an outputValueCalc (don't
forget about that one!). That is to say, must have a value in the infoset.
Then UPA rules combined with that would make everything unambiguous.

This restriction here is actually about points of uncertainty generally,
not just choices. E.g., if I have an optional element of complex type
sequence, then it's first child element and the next element following the
optional element cannot be optional and cannot have a default value.

The easiest way to always satisfy this restriction, is just wrap anything
at a point of uncertainty in another element tier. That always works and
always fixes it. However, using the above described restriction, we can
eliminate the need to do this wrapping, and still make it trivially
decidable which alternative to take when serializing.

The result of the above rule (d) is that your model then has a SDE unless
you change firstName to not be defaultable, but making that change fixes
your model (into something much more rational in my opinion, as having an
optional firstName is achieved by the choice. You don't also need it done
by optionality), though you still have the missing phone number to deal
with.

If you don't like my choice (d) above, then otherwise I'd say choice (a),
i.e., first possible wins, is the right behavior. In this case, a
suboptimal, but correct algorithm is to try serializing each choice branch
one by one in turn, and stop when one succeeds.  That's what the semantics
should be. It's much too hard to reason about anything else, and this is
symmetric with parsing, which does not search for the alternative that best
matches the data, it just takes the first successful.


...mikeb





On Fri, Apr 13, 2012 at 6:52 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> Hi Tim
>
> I've made some minor corrections to your summary of the problem.
>
> If the user restructures his model to wrap the sequences in elements then
> the problem goes away.  So I think we should keep the solution to this as
> simple as we can while not being unnecessarily restrictive.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Tim Kimber/UK/IBM at IBMGB
> To:        dfdl-wg at ogf.org
> Date:        13/04/2012 10:54
> Subject:        [DFDL-WG] How to choose the correct choice branch when
> serializing
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> There is an interesting edge case which arises when the serializer
> encounters a choice group.
> A DFDL xsd is structured as follows:
>
> <root>
>    <choice>
>        <sequence>
>            <firstname/>
>            *<lastname/>*
>            <postcode/>
>        </sequence>
>        <sequence>
>            *<lastname/>*
>            <telephoneNumber/>
>        </sequence>
>    </choice>
> </root>
>
> Note that both branches of the choice are sequences, not elements.
>
> The infoset is
>
> <root>
>    <lastName/>
>    <telephoneNumber/>
> </root>
>
> The likely action of the serializer is:
> - pick the first branch of the choice ( because it contains lastname )
>
> - output the default value of firstname ( assuming that firstname has
> minOccurs = 1 and has a default )
> - output lastname
> - issue a processing error because telephoneNumber is found in the info
> set but is not in the first branch.
>
> ...but from the infoset the user clearly intended:
> - select the second branch of the choice and successfully process the
> entire info set
>
>
> The DFDL specification does not state what the behaviour should be. I
> think the options are:
> a) state explicitly that the serializer will choose the first branch that
> contains a matching element, regardless of minOccurs
> b) invent a new rule that causes the parser to back out of a branch and
> try another branch if there is a minOccurs error while processing the branch
> c) disallow sequences and choices as immediate children of a choice group
>
> Currently I'm leaning toward a) by process of elimination, for the
> following reasons:
> b) would make this scenario work, but I think it would impose a lot of
> work on implementers because it would require the serializer to do
> backtracking.
> c) would simplify a lot of things, but I think it's too restrictive - I
> can imagine complex data formats where is might be useful to have a choice
> as the direct child of a choice because the discrimination rules might be
> easier to express in a two-level structure.
>
> regards,
>
> Tim Kimber, Common Transformation Team,
> Hursley, UK
> Internet:  kimbert at uk.ibm.com
> Tel. 01962-816742
> Internal tel. 246742
>
>
>
>  ------------------------------
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120413/a6b590fb/attachment.html>


More information about the dfdl-wg mailing list