[DFDL-WG] Fw: Action 248 (was Thoughts on a discriminator scenario)

Wed May 14 18:30:53 EDT 2014

I agree that the wording is not easy to get right. However, I think the 
current wording needs some adjustment so I'm going to make some 
suggestions and see where it leads.

"A point of uncertainty occurs in the data stream when there is more than 
one schema component 
that might occur at that point." 
I don't think this is precise enough. 
- if an optional element occurs at the end of the input data then there is 
only *one* schema component that might occur at that point. The end of the 
data stream might occur instead.
- if an optional element occurs before the last required element in a 
sequence AND the separatorSuppressionPolicy is not 'anyEmpty' then there 
is exactly one schema component that can occur at that point in the data 
stream. But it might be 'empty', in which case it will not be put into the 
info set. 
This is not pedantry. The parser will never need to backtrack in either of 
these cases and in the second case it is obvious in advance which schema 
component the parser should select for parsing. 

Points of uncertainty can be nested. 
Any one of the following constructs is a potential point of uncertainty: 
1. An xs:choice 
2. All xs:elements in an unordered xs:sequence (dfdl:sequenceKind is 
'unordered') 
3. An optional xs:element 
4. An array xs:element. 
5. All xs:elements in an xs:sequence containing one or more floating 
xs:elements. 
1. should say 'A member of an xs:choice' because it is the member, not the 
group itself, that is the point of uncertainty. I think the confusion has 
arisen because only one member of a choice group can exist in the data. So 
if any member exists, it automatically ends any speculation about the 
content of the choice group. But I insist that the real point of 
uncertainty is the member. A choice group is always 'known to exist' 
because according to DFDL rules it must have minOccurs=maxOccurs=1. FWIW, 
I have no problem with talking about 'resolving a choice', provided that 
we define that as 'Determining which member of a choice group ( if any ) 
is known to exist in the data'.
2. Should say 'All members of an unordered xs:sequence' to keep the 
language consistent with 1. The section on unordered groups clearly 
restricts members to elements only.
3. See above - an optional elements is not always a 'point of uncertainty' 
according to the literal definition that we are currently using.
4. Should say 'An optional occurrence of an array element, unless the 
separator properties make it a positional array and the occurrence is 
required in the data'
5. Should say 'All members...' for consistency.

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:   Steve Hanson/UK/IBM at IBMGB
To:     , 
Date:   13/05/2014 10:28
Subject:        [DFDL-WG] Fw: Action 248 (was Thoughts on a discriminator 
scenario)
Sent by:        dfdl-wg-bounces at ogf.org

This will be discussed on today's call. Please have a position on the 
paragraph below that ends 'What do others think?' 

Thanks

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 13/05/2014 10:19 ----- 

From:        Steve Hanson/UK/IBM 
To:        Tim Kimber/UK/IBM at IBMGB, 
Cc:        dfdl-wg at ogf.org 
Date:        30/04/2014 12:25 
Subject:        Re: [DFDL-WG] Action 248 (was Thoughts on a discriminator 
scenario) 

Tim 

Responses below. 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

From:        Tim Kimber/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        11/04/2014 14:03 
Subject:        Re: [DFDL-WG] Action 248 (was Thoughts on a discriminator 
scenario) 
Sent by:        dfdl-wg-bounces at ogf.org 

"2. If a potential point of uncertainty is sometimes an actual point of 
uncertainty (ock 'implicit') then a discriminator that applies it will 
only ever resolve, or have no effect on, that point of uncertainty. It 
never has an effect on any enclosing point of uncertainty." 
This could be misinterpreted. The discriminator could evaluate to 'false' 
and thus cause the POI to be resolved negatively ( the component would be 
'known not to exist' ) 

SMH: Agree, and I can improve the words here. 

1. and 3. will both apply if an element with ock='fixed' appears as a 
choice branch. Is the POI always an actual POI or never? 

SMH: No. There are two independent points of uncertainty, the choice 
branch and the array. 

The wording of 3. reads very strangely. 'If a potential point of 
uncertainty is never an actual point of uncertainty' begs the question 
'why is it even a potential point of uncertainty?'. The current wording 
follows from our definition of the term 'point of uncertainty': 
"A point of uncertainty occurs in the data stream when there is more than 
one schema component 
that might occur at that point." Points of uncertainty can be nested. 
Any one of the following constructs is a potential point of uncertainty: 
1. An xs:choice 
2. All xs:elements in an unordered xs:sequence (dfdl:sequenceKind is 
'unordered') 
3. An optional xs:element 
4. An array xs:element. 
5. All xs:elements in an xs:sequence containing one or more floating 
xs:elements. 
I think this definition is too broad. It forces us to discuss potential 
POUs that will never be actual POUs according to the first sentence. 

SMH: Yes it does read a bit strangely, but there's a reason for this. If 
we said that ock 'fixed', 'expression' or 'stopValue' are never POUs then 
what does it mean if a discriminator is placed on such an element?  A 
discriminator gets evaluated for each occurrence of an array. For that 
reason we can not let a discriminator within an array leak beyond the 
array - regardless of whether it is a POU or not - otherwise what does 
that mean to enclosing POUs? So even if we said that ock 'fixed', 
'expression' or 'stopValue' are never POUs we would still need the spec to 
state that a discriminator never leaks beyond them. I think it is clearer 
to say that a discriminator never leaks beyond a potential POU and keep 
the existing definition.  What do others think? 

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:        Steve Hanson/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        11/04/2014 11:44 
Subject:        Re: [DFDL-WG] Action 248 (was Thoughts on a discriminator 
scenario) 
Sent by:        dfdl-wg-bounces at ogf.org 

248
Discriminators and potential points of uncertainty (Steve) 
28/1: Steve to write up a proposal to prevent a discriminator from 
behaving in a non-obvious manner when used with a potential point of 
uncertainty that turns out not to be an actual point of uncertainty. 
5/2: Steve sent an email to check whether choice branches, unordered 
elements and floating elements should always be actual points of 
uncertainty, as there are times when there is no uncertainty, eg, last 
choice branch; all floating elements found. It was decided that they are 
always actual points of uncertainty. To do otherwise will complicate 
implementations and result in fragile schemas. Steve will proceed with the 
proposal on that basis.
Based on the above, which reflects the email discussion below, here is 
what I propose to resolve this action. 
1.        If a potential point of uncertainty is always an actual point of 
uncertainty (choice branch, element in unordered sequence, floating 
element, ock 'parsed') then a discriminator that applies to it will only 
ever resolve that point of uncertainty. It never has an effect on any 
enclosing point of uncertainty.   
2.        If a potential point of uncertainty is sometimes an actual point 
of uncertainty (ock 'implicit') then a discriminator that applies it will 
only ever resolve, or have no effect on, that point of uncertainty. It 
never has an effect on any enclosing point of uncertainty. 
3.        If a potential point of uncertainty is never an actual point of 
uncertainty (ock 'fixed', 'expression', 'stopValue') then a discriminator 
that applies to it will never have an effect on that point of uncertainty. 
Nor does it ever have an effect on any enclosing point of uncertainty. 
I think 1 and 2 are not controversial, but there is an alternative for 3: 
 3.   If a potential point of uncertainty is never an actual point of 
uncertainty (ock 'fixed', 'expression', 'stopValue') then a discriminator 
that applies to it will never have an effect on that point of uncertainty. 
Instead the discriminator is applied to any enclosing point of 
uncertainty. 
The alternative means that changing an element from (say) ock 'parsed' to 
ock 'expression' has the same effect on a discriminator as changing the 
element to (1,1). The discriminator that applied to it now applies to any 
enclosing pou. 
SMH: Afternote: The alternative does not work for the reason given in my 
reply to Tim above. 
Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

From:        Steve Hanson/UK/IBM 
To:        Tim Kimber/UK/IBM at IBMGB, 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date:        05/02/2014 12:04 
Subject:        Re: [DFDL-WG] Action 248 (was Thoughts on a discriminator 
scenario) 

Thanks Tim, all good points. Comments to your comments. 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

From:        Tim Kimber/UK/IBM 
To:        Steve Hanson/UK/IBM at IBMGB, 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date:        05/02/2014 11:01 
Subject:        Re: [DFDL-WG] Action 248 (was Thoughts on a discriminator 
scenario) 

A couple of comments below. 

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:        Steve Hanson/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        05/02/2014 10:50 
Subject:        [DFDL-WG] Action 248 (was Thoughts on a discriminator 
scenario) 
Sent by:        dfdl-wg-bounces at ogf.org 

248
Discriminators and potential points of uncertainty (Steve) 
28/1: Steve to write up a proposal to prevent a discriminator from 
behaving in a non-obvious manner when used with a potential point of 
uncertainty that turns out not to be an actual point of uncertainty. 
5/2: With Steve
I started on this by reading section 9.3.3 on points of uncertainty, which 
lists the potential PoUs. Here's the list to save getting the spec out. 
1.        An xs:choice branch 
2.        All xs:elements in an unordered xs:sequence (dfdl:sequenceKind 
is 'unordered') 
3.        An optional xs:element 
4.        An array xs:element 
5.        All xs:elements in an xs:sequence containing one or more 
floating xs:elements. 
The section then looks at each in turn and gives the circumstances when it 
is an actual PoU or not. As currently written, it is only 3 and 4 where a 
potential PoU might not be an actual PoU. For 1, 2 and 5 it says they are 
always actual PoUs. 
But I'm not sure that's correct. A deeper analysis of what is actually 
going on with 1, 2 and 5 says to me that there are times when there might 
not be an actual PoU. 
1. Given that there is no concept in DFDL of optional choice branches, 
then if the last branch is reached then there is no longer a PoU. It must 
be that branch else it is a processing error. 
TK: I think of it slightly differently. It is a PoU, even if the branch is 
the only remaining branch. If we say that the final choice branch is not a 
PoU then diagnostics become confused - the parser reports the error code 
as 'error while parsing root/choice/lastBranch/field1' when the correct 
error code would be 'none of the branches of root/choice were found in the 
data'. 
SMH: I see your point. My thinking was that choices have finite branches 
and a choice is (1,1). If I have got to the last branch then I am not one 
of the other branches so I must be this one. If there is any other 
possibility then the model is missing a branch, even if it is just one 
that contains an empty sequence with an assert {fn:false()}. In practice 
of course users forget to add that last branch (there's no XSDL equivalent 
to the 'default' branch of a switch/case statement), so yes they could end 
up with an unclear diagnostic. 
2. There can come a point in an unordered sequence when all that can be 
encountered is one element, and if that is (1,1) then there is no longer a 
PoU. 
TK: It's still a PoU. The specification says that occursCountKind is 
'parsed' for all members of an unordered group, so min/maxOccurs do not 
come into play. 
SMH: Interesting. The spec says that if a member is optional or an array 
then it must be 'parsed'. If it is (1,1) though it does not have an 
occursCountKind. The specific case I was thinking of is when all members 
are (1,1), so when you have one element to go there is no PoU. However, 
the rewrite into a repeating choice has the effect of making everything 
'parsed', which is really the point you are making. So I agree with you, 
it is easier to say that everything is an actual PoU else it complicates 
the rewrite semantic. 
5. If all floating elements are (1,1) and all are encountered, then from 
that point on there are no longer any PoUs due to floating elements. 
TK: I suspect that floating elements are somewhat like unordered branches 
- most users will not want min/maxOccurs to affect the parsing of the 
group. Schema validation ( or more complex validation applied in the 
receiving application ) will deal with non-conformances. 
SMH: Possibly yes. With something like X12 NTE segments, that is the case. 
But we don't express the floating semantic as a rewrite of the whole 
sequence like we do for unordered, it's more of a per element thing. And 
if that is done dynamically as we go through the sequence, having no PoU 
can result. 
I'd like us to get straight on this before I proceed with the action 
proper. 
Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2014 10:12 ----- 

From:        Steve Hanson/UK/IBM 
To:        dfdl-wg at ogf.org, 
Date:        27/01/2014 17:39 
Subject:        Fw: Thoughts on a discriminator scenario 

Been thinking some more on the discriminator scenario below that I mailed 
out before xmas, and discussing it with the IBM DFDL team. 

The 'confusing' aspect of the behaviour is that a discriminator within a 
potential PoU will act on a higher level PoU if the potential PoU is not 
an actual PoU. In the example, the array element 'Type1' is not an actual 
PoU for occurrence 1, only for occurrences 2+. So when the discriminator 
fires for occurrence 1 it will resolve a higher level unresolved PoU if 
one exists.   

Perhaps the spec should say that a discriminator can't 'leak' beyond the 
potential PoU that encloses it ? If so, then for occurrence 1 the 
discriminator has no effect, and only has an effect for occurrences 2+. 
This makes for more predictable and robust schemas. 

We'd need to go through spec section 9.3.3 carefully to see if this does 
not break any of the potential PoUs that are listed. 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 16/01/2014 09:55 ----- 

From:        Steve Hanson/UK/IBM 
To:        dfdl-wg at ogf.org, 
Date:        20/12/2013 13:20 
Subject:        Thoughts on a discriminator scenario 

Take the following schema (simplified) for element Type1 (1,10) being a 
loop for elements A,B,C.  Type 1 does not have an initiator so I need to 
use a discriminator to establish the existence of an occurrence of Type1 
so that incorrect backtracking does not occur after an error. Because 
occursCountKind is 'implicit', the 1st occurrence is not a point of 
uncertainty so the discriminator acts instead on any enclosing point of 
uncertainty, but for 2nd and subsequent occurrences it acts on Type1. That 
is all working as designed, but I think users find will the 1st occurrence 
behaviour a bit confusing. There are workarounds to avoid the problem, eg, 
use occursCountKind 'parsed' or split Type1 into two as (1,1) and (0,9). I 
think this is worth documenting in a tutorial as this is quite subtle 
stuff. 

     <xs:element name="Type1" maxOccurs="10" 
dfdl:occursCountKind="implicit"> 
                     <dfdl:discriminator test="{fn:exists(A)}" /> 
             <xs:complexType> 
                     <xs:sequence> 
                             <xs:element name="A" dfdl:initiator="A:" ... 
/>
                            <xs:element name="B" dfdl:initiator="B:" ... 
/>
                            <xs:element name="C" dfdl:initiator="C:"... />
                    </xs:sequence> 
             </xs:complexType> 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140514/397f6b09/attachment-0001.html>