[DFDL-WG] validating expressions on elements in a choice or unordered sequence

Fri Apr 11 08:12:15 EDT 2014

Comments inline

On Fri, Apr 11, 2014 at 6:22 AM, Mark Frost <FROSTMAR at uk.ibm.com> wrote:

> When we were implementing unordered sequences, this raised some questions
> around evaluating relative paths in expressions, for elements in a choice
> or unordered sequence :
>
> DFDL spec: (gwdrp-dfdl-v1.0.4 section 15)
> * "When processing a choice group the parser validates*
> *any contained path expressions. If a path *
> *expression contained inside a choice branch refers *
> *to any other branch of the choice, then it is a *
> *schema definition error."*
>
> 1.        I'm not clear what benefit this restriction on path expressions
> gives.
> It seems redundant since in any single instance of a choice group, if the
> branch  being processed exists, then by definition none of it's sibling
> branches exist. Any expression path referring to a non-existent branch
>  would correctly return <*empty sequence*>
>
> Typically in XPath, such paths would just be empty-sequence at runtime.
Making it an SDE hoists the error to (hopefully) compile time, and making
it SDE (non-recoverable) changes the way one must write expressions. You
can't write utter nonsense paths and have them be runnable.

> If the choice group is inside a repeating structure, then expressions
> referring to choice branches within *other *instances of the choice could
> be useful.
> Should an expression referring to branches in *other instances* of a
> choice cause a schemadef error?
>

Should be no issue if you are looking at say, position() - n. If you reach
to something that doesn't exist, then you'll get empty sequence.

My experience so far with XPath is that this notion that non-existance
returns empty sequence is painful at best and a nightmare at worst.
Expressions that are utterly nonsense are accepted executed, and silently
fail by returning empty sequence.  The most common mistake is writing
/a/b/c when you needed /ns1:a/ns2:b/ns3:c.

>
> Example
> expression on el_b could be { fn:count(../../el_choice/el_a) }
>
> - parent
>  [sequence]
>    - el_choice [minOccurs=5 maxOccurs=5]
>      [choice]
>        - el_a
>        - el_b
>
>
> 2.        Should an expression that *potentially *refers to branches in
> the choice cause a schemadef error?
>
> Example
> identically named elements in and out of a choice
> expression on el_c could be { fn:count(../el_a) }
>
> - parent
>  [sequence]
>    - *el_a*
>    - el_b
>    - [embedded choice group]
>       - *el_a*
>       - el_c
>
>
I'd love to restrict this, because we're looking at having to create a DFDL
expression language implementation for performance reasons, and complex
things like this require a very complex implementation tantamount to a
query-engine.

I would claim that these two el_a elements are different, and we could
choose to restrict a DFDL path expression to return only nodes described by
the same schema component, with "same schema component" meaning same path
from document element to the schema component where an element or group or
type reference counts as part of that path. So two different element
references to the same global element would be two different schema
components.

But I suspect that this is too restrictive, and implementations are just
going to have to be sophisticated enough to execute queries like this one,
and a good implementation will optimize simpler cases for faster execution.

...mikeb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140411/8e74bd15/attachment.html>