[DFDL-WG] Action 188 - Path expressions, empty node sequences, and errors

Steve Hanson smh at uk.ibm.com
Tue Oct 23 11:26:55 EDT 2012


Ideally we would want to say that path locations that can never result in 
a nodeset of exactly one are schema definition errors, and any remaining 
runtime errors are processing errors. But the former is not easy to 
establish, and there are grey areas. Eg, an element uses a path that goes 
outside of its global containing object. Eg, an element uses a path that 
refers to an element that is declared multiple times but each one is 
optional.

XPath 2.0 defines a set of error codes. Let's look at these and see 
whether it is possible to carve these up between schema definition error 
and processing error.

Mike will see whether XPath 2.0 autocasts its result to match the expected 
context, where possible to do so. 
 
Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   22/10/2012 17:23
Subject:        Re: [DFDL-WG] Action 188 - Path expressions, empty node 
sequences,      and errors
Sent by:        dfdl-wg-bounces at ogf.org



Comments summarized from the WG call on 2012-10-22

IBM commented that its implementation is checking that path expressions 
return only a single node, and not no nodes or multiple nodes.

It is proposed that an existing XPath implementation could be used by a 
DFDL implementation, but not without some effort to:

(a) analyze expressions so as to statically detect malformed paths or 
paths that are known to return no or multiple (not one) node as SDE.
(b) impose the semantics of fn:exactly-one on other paths at processing 
time. 

Issue: is (b) an SDE or a PE?

Further question (not from the call, but for discussion): do DFDL 
expressions automatically take on type? E.g.,
<dfdl:discriminator>true</dfdl:discriminator> versus 
<dfdl:discriminator>xs:boolean("true")</dfdl:discriminator>

...mike



On Wed, Oct 3, 2012 at 6:51 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com> 
wrote:

Issue: what is semantics of a path expression returning an empty node 
sequence.

Current spec language says it behaves as if it returned nil.

This isn't well formed. nil is not an empty node sequence it's  a special 
reserved value. This definition is neither consistent with XPath (which 
lets functions decide what the behavior for empty node sequence is 
depending on the function), nor consistent with use of nil elsewhere in 
DFDL.

Discussion:

Possible changes 
1) Any path expression that evaluates to empty node sequence causes an SDE
2) ditto except PE
3) XPath consistent - let the functions decide. So for string functions, 
an empty node sequence could be treated as "" as in XPath. An empty node 
sequence returned as the value of a DFDL Infoset item would depend on the 
type of the infoset item. For a string it could be "", for a boolean it 
could be false, etc.
4) ANything else?

It is very desirable that they should be schema definition errors because 
the most likely usage pattern is to create a relative path reaching to a 
part of the structure that is supposed to exist unconditionally. Since 
DFDL path expressions are a first order language (meaning you can't 
construct a path from a string), the DFDL compiler can find the vast 
majority of Path mistakes (misspelling a path step name for example, or 
wrong number of "../.." steps in a relative path), all at compile time and 
issue SDEs for them. The cases where a path might or might not exist will 
be far more rare.

However, there is the issue of deep embedding of a path inside an 
expression. If we want a DFDL processor to be XPath compatible (roughly), 
and to be able to be implemented by reusing an XPath implementation, then 
there is the problem that the DFDL implementation reuses the XPath 
implementation as a black box, and it does not get to see the path 
expressions that return empty node sequences unless they are returned to 
it from the XPath evaluator.

An XPath implementation embedded inside a DFDL implementation would 
happily evaluate concat( path1, path2) and if path1 turned out to be empty 
node sequence, it would get "" for that, and the DFDL implementation might 
not have any way to intercept this to implment the more rigorous semantics 
that issues an SDE (or even a PE).

Adopting XPath semantics entirely makes things like  
concat(../a/complete/nonsense/path, "foobar") into valid code. The path 
may be meaningless, but that means it will just be treated as "". 

Suggested Solution:

We can, however, have our cake and eat it too. 

Assume we embed an ordinary XPath semantics inside DFDL (choice 3 above). 
Implementors embed XPath implementations black-box.

In this case I believe we badly need the fn:exactly-one(arg) function in 
the DFDL library so that one can wrap it around almost every path 
expression to get a processing error if it is not one node, and we need to 
add a dfdl:nodePath(arg) function (the name 'nodePath' meaning 'is 
expected to be a path to just one node' - entertain a different name if 
you prefer) which is the same, but issues an SDE and suggests to the 
implementation that it should be checked before runtime. 

This would let a cautious DFDL schema author wrap path expressions with 
fn:exactly-one or dfdl:nodePath to get the strong checking and behaviour 
they want.

This is tedious, but gives us XPath compatibility and ease of 
implementation. 

Details:

There is the below implication for the spec, among others:

In the spec our function signatures use '?' after parameter or return type 
for expression language functions means they can be either a single value 
or the empty sequence. 

If we decide these paths cannot be empty node sequences, then these ? all 
must be removed. If we decide they can be empty node sequences, then we 
must specify behavior of each function when empty sequence is the 
argument. 

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412




-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412




-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121023/60f361ae/attachment.html>


More information about the dfdl-wg mailing list