[DFDL-WG] validating expressions on elements in a choice or unordered sequence

Tue Apr 29 05:32:41 EDT 2014

Tim, thanks for clarifying where you are coming from on this.

I interpreted "some component of the static context" differently to you. 
The components are the 'variables' in the static context, not schema 
objects. The XPST0001 error says "It is a static error if analysis of an 
expression relies on some component of the static context that has not 
been assigned a value. " , so I think this message would be issued if, for 
example, an XPath processor tried to look up an element in the In-scope 
schema definitions component but found that the component itself had not 
been built. I think the error that applies to the case we are discussing 
is XPST0008 as quoted below.  Otherwise I don't see when XPST0008 would be 
issued.

Consequently I am not sure that XPath rules prohibit the 
genuinely-impossible situation where an expression on choiceElement[1]/A 
refers to choiceElement[1]/B, because I can't see a crisp definition of 
in-scope. Maybe I'm missing it? Common sense says that it should be 
prohibited. But I agree that XPath does not prohibit choiceElement/A 
referring to choiceElement/B generally. 

My conclusions about the rules in 23.1 was perhaps too general, and they 
should be looked at on an individual basis. But the reason they exist is 
because the parser is not acting on the whole message and in general is 
prohibited from looking ahead. Clearly the first two rules are needed 
(assert/discriminator can look down but not ahead, and outputValueCalc can 
look ahead). The other three are the ones in question. The current wording 
came from this action which was spun off from the unordered action 199:

214
Expression Language Data Model (All)
16/7: Augment section 23.1 so that it covers the cases where an XPath 
references an element that is in a different choice branch or that is in 
the same unordered sequence or that is floating. These cases could be 
detected statically (though to do this 100% reliably is not easy) or they 
could be left until runtime and fail if the element does not exist. Both 
are schema definition errors as explained by errata 2.120. 
23/7: Closed. Runtime schema definition error if the element does not 
exist. Errata taken.

(Erratum 2.120 is just the one that categorised all the different errors.)

For the choice branch rule: If XPath disallows choice/x[1] referring to 
choice/y[1] but allows general refs, then I am ok with changing this to 
match XPath (ie, we drop our rule as it is implied). If it doesn't 
disallow choice/x[1] referring to choice/y[1] then perhaps we should keep 
our rule but tighten it up.

I think the unordered rule and floating rule both stem from the choice 
branch rule - because of the rewrite semantic that turns unordered into a 
choice. 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Tim Kimber/UK/IBM
To:     Steve Hanson/UK/IBM at IBMGB, 
Cc:     Mark Frost/UK/IBM at IBMGB
Date:   28/04/2014 22:03
Subject:        Re: [DFDL-WG] validating expressions on elements in a 
choice or unordered sequence

I don't think I am missing the point. DFDL's usage of a partial XDM is not 
in play here.

The static context is defined thus: [Definition: The static context of an 
expression is the information that is available during static analysis of 
the expression, prior to its evaluation.] This information can be used to 
decide whether the expression contains a static error. If analysis of an 
expression relies on some component of the static context that has not 
been assigned a value, a static error is raised [err:XPST0001]. 

This is the definition that is relevant, because the errors are issued by 
our static analysis ( performed by the DFDL validator ). I believe the 
rule already prohibits the genuinely-impossible situation where an 
expression on choiceElement[1]/A refers to choiceElement[1]/B. But it does 
not prohibit choiceElement/A referring to choiceElement/B - and I don't 
think DFDL should either.

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:   Steve Hanson/UK/IBM
To:     Tim Kimber/UK/IBM at IBMGB, 
Cc:     dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date:   28/04/2014 13:51
Subject:        Re: [DFDL-WG] validating expressions on elements in a 
choice or unordered sequence

It is certainly easier if we can just do the same as XPath 2.0 stipulates. 
But I think that this misses the point here.

The XPath error for statically detecting that an expression refers to 
something that can never exist is XPST0008, which says:

It is a static error if an expression refers to an element name, attribute 
name, schema type name, namespace prefix, or variable name that is not 
defined in the static context, except for an ElementName in an ElementTest 
or an AttributeName in an AttributeTest. 

The static context has the notion of "In-scope schema definitions" being "
a generic term for all the element declarations, attribute declarations, 
and schema type definitions that are in scope during processing of an 
expression.". It doesn't define exactly what is meant by "in-scope" but 
XPath assumes that it acts on a complete instance of an XDM. 

In DFDL we are different to typical XPath usage as we are applying 
expressions during parsing when the document is incomplete. We can use 
that as the justification for applying extra constraints, which is exactly 
why there are additional rules in section 23.1.

So, if there are scenarios where a rule is going to be restrictive then we 
should consider dropping it. If there are not, but it makes the life of an 
implementer harder because it is hard to code the rule, then we should 
consider dropping it. Otherwise keep it.

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Tim Kimber/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org, 
Date:   11/04/2014 14:03
Subject:        Re: [DFDL-WG] validating expressions on elements in a 
choice or unordered sequence
Sent by:        dfdl-wg-bounces at ogf.org

I would be quite uncomfortable with DFDL not being a 'proper subset' of 
XPath 2.0. I understand the motivation ( having personally been involved 
in coding a query engine for DFDL ) but I think the cure would be worse 
than the complaint. Consistent with that, I think I agree with Mark's 
suggestion - a DFDL processor should just 'do what an XPath processor 
would do'. 

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Mark Frost/UK/IBM at IBMGB, 
Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org> 
Date:        11/04/2014 13:23 
Subject:        Re: [DFDL-WG] validating expressions on elements in a 
choice or unordered sequence 
Sent by:        dfdl-wg-bounces at ogf.org 

Comments inline

On Fri, Apr 11, 2014 at 6:22 AM, Mark Frost <FROSTMAR at uk.ibm.com> wrote: 
When we were implementing unordered sequences, this raised some questions 
around evaluating relative paths in expressions, for elements in a choice 
or unordered sequence : 

DFDL spec: (gwdrp-dfdl-v1.0.4 section 15)
"When processing a choice group the parser validates 
any contained path expressions. If a path 
expression contained inside a choice branch refers 
to any other branch of the choice, then it is a 
schema definition error."

1.        I'm not clear what benefit this restriction on path expressions 
gives.
It seems redundant since in any single instance of a choice group, if the 
branch  being processed exists, then by definition none of it's sibling 
branches exist. Any expression path referring to a non-existent branch 
would correctly return <empty sequence> 

Typically in XPath, such paths would just be empty-sequence at runtime. 
Making it an SDE hoists the error to (hopefully) compile time, and making 
it SDE (non-recoverable) changes the way one must write expressions. You 
can't write utter nonsense paths and have them be runnable. 

If the choice group is inside a repeating structure, then expressions 
referring to choice branches within other instances of the choice could be 
useful.
Should an expression referring to branches in other instances of a choice 
cause a schemadef error? 

Should be no issue if you are looking at say, position() - n. If you reach 
to something that doesn't exist, then you'll get empty sequence.

My experience so far with XPath is that this notion that non-existance 
returns empty sequence is painful at best and a nightmare at worst. 
Expressions that are utterly nonsense are accepted executed, and silently 
fail by returning empty sequence.  The most common mistake is writing 
/a/b/c when you needed /ns1:a/ns2:b/ns3:c. 

Example
expression on el_b could be { fn:count(../../el_choice/el_a) }

- parent
 [sequence]
   - el_choice [minOccurs=5 maxOccurs=5]
     [choice]
       - el_a
       - el_b

2.        Should an expression that potentially refers to branches in the 
choice cause a schemadef error?

Example
identically named elements in and out of a choice
expression on el_c could be { fn:count(../el_a) }

- parent
 [sequence]
   - el_a
   - el_b
   - [embedded choice group]
      - el_a
      - el_c 

I'd love to restrict this, because we're looking at having to create a 
DFDL expression language implementation for performance reasons, and 
complex things like this require a very complex implementation tantamount 
to a query-engine. 

I would claim that these two el_a elements are different, and we could 
choose to restrict a DFDL path expression to return only nodes described 
by the same schema component, with "same schema component" meaning same 
path from document element to the schema component where an element or 
group or type reference counts as part of that path. So two different 
element references to the same global element would be two different 
schema components. 

But I suspect that this is too restrictive, and implementations are just 
going to have to be sophisticated enough to execute queries like this one, 
and a good implementation will optimize simpler cases for faster 
execution. 

...mikeb--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140429/bad7bcae/attachment-0001.html>