[DFDL-WG] Action 188 - Path expressions, empty node sequences, and errors

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Oct 22 12:23:02 EDT 2012


Comments summarized from the WG call on 2012-10-22

IBM commented that its implementation is checking that path expressions
return only a single node, and not no nodes or multiple nodes.

It is proposed that an existing XPath implementation could be used by a
DFDL implementation, but not without some effort to:

(a) analyze expressions so as to statically detect malformed paths or paths
that are known to return no or multiple (not one) node as SDE.
(b) impose the semantics of fn:exactly-one on other paths at processing
time.

Issue: is (b) an SDE or a PE?

Further question (not from the call, but for discussion): do DFDL
expressions automatically take on type? E.g.,
<dfdl:discriminator>true</dfdl:discriminator> versus
<dfdl:discriminator>xs:boolean("true")</dfdl:discriminator>

...mike



On Wed, Oct 3, 2012 at 6:51 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com>wrote:

>
> Issue: what is semantics of a path expression returning an empty node
> sequence.
>
> Current spec language says it behaves as if it returned nil.
>
> This isn't well formed. nil is not an empty node sequence it's  a special
> reserved value. This definition is neither consistent with XPath (which
> lets functions decide what the behavior for empty node sequence is
> depending on the function), nor consistent with use of nil elsewhere in
> DFDL.
>
> *Discussion:*
>
> Possible changes
> 1) Any path expression that evaluates to empty node sequence causes an SDE
> 2) ditto except PE
> 3) XPath consistent - let the functions decide. So for string functions,
> an empty node sequence could be treated as "" as in XPath. An empty node
> sequence returned as the value of a DFDL Infoset item would depend on the
> type of the infoset item. For a string it could be "", for a boolean it
> could be false, etc.
> 4) ANything else?
>
> It is very desirable that they should be schema definition errors because
> the most likely usage pattern is to create a relative path reaching to a
> part of the structure that is supposed to exist unconditionally. Since DFDL
> path expressions are a first order language (meaning you can't construct a
> path from a string), the DFDL compiler can find the vast majority of Path
> mistakes (misspelling a path step name for example, or wrong number of
> "../.." steps in a relative path), all at compile time and issue SDEs for
> them. The cases where a path might or might not exist will be far more rare.
>
> However, there is the issue of deep embedding of a path inside an
> expression. If we want a DFDL processor to be XPath compatible (roughly),
> and to be able to be implemented by reusing an XPath implementation, then
> there is the problem that the DFDL implementation reuses the XPath
> implementation as a black box, and it does not get to see the path
> expressions that return empty node sequences unless they are returned to it
> from the XPath evaluator.
>
> An XPath implementation embedded inside a DFDL implementation would
> happily evaluate concat( path1, path2) and if path1 turned out to be empty
> node sequence, it would get "" for that, and the DFDL implementation might
> not have any way to intercept this to implment the more rigorous semantics
> that issues an SDE (or even a PE).
>
> Adopting XPath semantics entirely makes things like
> concat(../a/complete/nonsense/path, "foobar") into valid code. The path may
> be meaningless, but that means it will just be treated as "".
> *
> Suggested Solution:*
>
> We can, however, have our cake and eat it too.
>
> Assume we embed an ordinary XPath semantics inside DFDL (choice 3 above).
> Implementors embed XPath implementations black-box.
>
> In this case I believe we badly need the fn:exactly-one(arg) function in
> the DFDL library so that one can wrap it around almost every path
> expression to get a processing error if it is not one node, and we need to
> add a dfdl:nodePath(arg) function (the name 'nodePath' meaning 'is expected
> to be a path to just one node' - entertain a different name if you prefer)
> which is the same, but issues an SDE and suggests to the implementation
> that it should be checked before runtime.
>
> This would let a cautious DFDL schema author wrap path expressions with
> fn:exactly-one or dfdl:nodePath to get the strong checking and behaviour
> they want.
>
> This is tedious, but gives us XPath compatibility and ease of
> implementation.
>
> *Details:*
>
> There is the below implication for the spec, among others:
>
> In the spec our function signatures use '?' after parameter or return type
> for expression language functions means they can be either a single value
> or the empty sequence.
>
> If we decide these paths cannot be empty node sequences, then these ? all
> must be removed. If we decide they can be empty node sequences, then we
> must specify behavior of each function when empty sequence is the argument.
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  781-330-0412
>
>
>
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  781-330-0412
>
>


-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121022/6f6d903a/attachment.html>


More information about the dfdl-wg mailing list