[DFDL-WG] How to choose the correct choice branch when serializing

Steve Hanson smh at uk.ibm.com
Fri Apr 13 06:52:35 EDT 2012


Hi Tim

I've made some minor corrections to your summary of the problem.

If the user restructures his model to wrap the sequences in elements then 
the problem goes away.  So I think we should keep the solution to this as 
simple as we can while not being unnecessarily restrictive. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Tim Kimber/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org
Date:   13/04/2012 10:54
Subject:        [DFDL-WG] How to choose the correct choice branch when 
serializing
Sent by:        dfdl-wg-bounces at ogf.org



There is an interesting edge case which arises when the serializer 
encounters a choice group. 
A DFDL xsd is structured as follows: 

<root> 
    <choice> 
        <sequence> 
            <firstname/> 
            <lastname/> 
            <postcode/> 
        </sequence> 
        <sequence> 
            <lastname/> 
            <telephoneNumber/> 
        </sequence> 
    </choice> 
</root> 

Note that both branches of the choice are sequences, not elements. 

The infoset is 

<root> 
    <lastName/> 
    <telephoneNumber/> 
</root> 

The likely action of the serializer is: 
- pick the first branch of the choice ( because it contains lastname ) 
- output the default value of firstname ( assuming that firstname has 
minOccurs = 1 and has a default ) 
- output lastname 
- issue a processing error because telephoneNumber is found in the info 
set but is not in the first branch. 

...but from the infoset the user clearly intended: 
- select the second branch of the choice and successfully process the 
entire info set 

The DFDL specification does not state what the behaviour should be. I 
think the options are: 
a) state explicitly that the serializer will choose the first branch that 
contains a matching element, regardless of minOccurs 
b) invent a new rule that causes the parser to back out of a branch and 
try another branch if there is a minOccurs error while processing the 
branch 
c) disallow sequences and choices as immediate children of a choice group 

Currently I'm leaning toward a) by process of elimination, for the 
following reasons: 
b) would make this scenario work, but I think it would impose a lot of 
work on implementers because it would require the serializer to do 
backtracking. 
c) would simplify a lot of things, but I think it's too restrictive - I 
can imagine complex data formats where is might be useful to have a choice 
as the direct child of a choice because the discrimination rules might be 
easier to express in a two-level structure. 

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120413/439525a5/attachment.html>


More information about the dfdl-wg mailing list