[DFDL-WG] Action 248 (was Thoughts on a discriminator scenario)
Steve Hanson
smh at uk.ibm.com
Wed Feb 5 07:09:26 EST 2014
Thanks Tim, all good points. Comments to your comments.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Tim Kimber/UK/IBM
To: Steve Hanson/UK/IBM at IBMGB,
Cc: dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date: 05/02/2014 11:01
Subject: Re: [DFDL-WG] Action 248 (was Thoughts on a discriminator
scenario)
A couple of comments below.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Steve Hanson/UK/IBM at IBMGB
To: dfdl-wg at ogf.org,
Date: 05/02/2014 10:50
Subject: [DFDL-WG] Action 248 (was Thoughts on a discriminator
scenario)
Sent by: dfdl-wg-bounces at ogf.org
248
Discriminators and potential points of uncertainty (Steve)
28/1: Steve to write up a proposal to prevent a discriminator from
behaving in a non-obvious manner when used with a potential point of
uncertainty that turns out not to be an actual point of uncertainty.
5/2: With Steve
I started on this by reading section 9.3.3 on points of uncertainty, which
lists the potential PoUs. Here's the list to save getting the spec out.
1. An xs:choice branch
2. All xs:elements in an unordered xs:sequence (dfdl:sequenceKind
is 'unordered')
3. An optional xs:element
4. An array xs:element
5. All xs:elements in an xs:sequence containing one or more
floating xs:elements.
The section then looks at each in turn and gives the circumstances when it
is an actual PoU or not. As currently written, it is only 3 and 4 where a
potential PoU might not be an actual PoU. For 1, 2 and 5 it says they are
always actual PoUs.
But I'm not sure that's correct. A deeper analysis of what is actually
going on with 1, 2 and 5 says to me that there are times when there might
not be an actual PoU.
1. Given that there is no concept in DFDL of optional choice branches,
then if the last branch is reached then there is no longer a PoU. It must
be that branch else it is a processing error.
TK: I think of it slightly differently. It is a PoU, even if the branch is
the only remaining branch. If we say that the final choice branch is not a
PoU then diagnostics become confused - the parser reports the error code
as 'error while parsing root/choice/lastBranch/field1' when the correct
error code would be 'none of the branches of root/choice were found in the
data'.
SMH: I see your point. My thinking was that choices have finite branches
and a choice is (1,1). If I have got to the last branch then I am not one
of the other branches so I must be this one. If there is any other
possibility then the model is missing a branch, even if it is just one
that contains an empty sequence with an assert {fn:false()}. In practice
of course users forget to add that last branch (there's no XSDL equivalent
to the 'default' branch of a switch/case statement), so yes they could end
up with an unclear diagnostic.
2. There can come a point in an unordered sequence when all that can be
encountered is one element, and if that is (1,1) then there is no longer a
PoU.
TK: It's still a PoU. The specification says that occursCountKind is
'parsed' for all members of an unordered group, so min/maxOccurs do not
come into play.
SMH: Interesting. The spec says that if a member is optional or an array
then it must be 'parsed'. If it is (1,1) though it does not have an
occursCountKind. The specific case I was thinking of is when all members
are (1,1), so when you have one element to go there is no PoU. However,
the rewrite into a repeating choice has the effect of making everything
'parsed', which is really the point you are making. So I agree with you,
it is easier to say that everything is an actual PoU else it complicates
the rewrite semantic.
5. If all floating elements are (1,1) and all are encountered, then from
that point on there are no longer any PoUs due to floating elements.
TK: I suspect that floating elements are somewhat like unordered branches
- most users will not want min/maxOccurs to affect the parsing of the
group. Schema validation ( or more complex validation applied in the
receiving application ) will deal with non-conformances.
SMH: Possibly yes. With something like X12 NTE segments, that is the case.
But we don't express the floating semantic as a rewrite of the whole
sequence like we do for unordered, it's more of a per element thing. And
if that is done dynamically as we go through the sequence, having no PoU
can result.
I'd like us to get straight on this before I proceed with the action
proper.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2014 10:12 -----
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org,
Date: 27/01/2014 17:39
Subject: Fw: Thoughts on a discriminator scenario
Been thinking some more on the discriminator scenario below that I mailed
out before xmas, and discussing it with the IBM DFDL team.
The 'confusing' aspect of the behaviour is that a discriminator within a
potential PoU will act on a higher level PoU if the potential PoU is not
an actual PoU. In the example, the array element 'Type1' is not an actual
PoU for occurrence 1, only for occurrences 2+. So when the discriminator
fires for occurrence 1 it will resolve a higher level unresolved PoU if
one exists.
Perhaps the spec should say that a discriminator can't 'leak' beyond the
potential PoU that encloses it ? If so, then for occurrence 1 the
discriminator has no effect, and only has an effect for occurrences 2+.
This makes for more predictable and robust schemas.
We'd need to go through spec section 9.3.3 carefully to see if this does
not break any of the potential PoUs that are listed.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/01/2014 09:55 -----
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org,
Date: 20/12/2013 13:20
Subject: Thoughts on a discriminator scenario
Take the following schema (simplified) for element Type1 (1,10) being a
loop for elements A,B,C. Type 1 does not have an initiator so I need to
use a discriminator to establish the existence of an occurrence of Type1
so that incorrect backtracking does not occur after an error. Because
occursCountKind is 'implicit', the 1st occurrence is not a point of
uncertainty so the discriminator acts instead on any enclosing point of
uncertainty, but for 2nd and subsequent occurrences it acts on Type1. That
is all working as designed, but I think users will the 1st occurrence
behaviour a bit confusing. There are workarounds to avoid the problem, eg,
use occursCountKind 'parsed' or split Type1 into two as (1,1) and (0,9). I
think this is worth documenting in a tutorial as this is quite subtle
stuff.
<xs:element name="Type1" maxOccurs="10"
dfdl:occursCountKind="implicit">
<dfdl:discriminator test="{fn:exists(A)}" />
<xs:complexType>
<xs:sequence>
<xs:element name="A" dfdl:initiator="A:"
... />
<xs:element name="B" dfdl:initiator="B:"
... />
<xs:element name="C" dfdl:initiator="C:"...
/>
</xs:sequence>
</xs:complexType>
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140205/6d2c0647/attachment.html>
More information about the dfdl-wg
mailing list