[DFDL-WG] Action 315: fn:count(.), fn:exists(.)

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Nov 5 15:42:01 EST 2020

There are 4 functions which operate on the infoset and it is unclear their
behavior depending on when they are evaluated during parse/unparse.

The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.

The behavior when unparsing is less problematic, because one could simply
require the infoset nodes being referenced to be fully-constructed before
these functions are allowed to evaluate.

However, when parsing the behavior is more subtle, and unparsing may want
to be made consistent with decisions about behavior for parsing.

Our call minutes about this action item suggest reviewing the
known-to-exist and known-not-to-exist definitions to see whether these
function definitions should be defined in terms of that. I have reviewed
those sections, and so far I'm not sure they will contribute.

The general problem is this, in terms of fn:count(path). The path is to an
infoset node or an array of occurrences that is currently being parsed. It
is possible that the status of known-to-exist or not is simply not well
known at the point the expression is being evaluated.

The answer to fn:count(path) wants to always be the same as if the infoset
were fully constructed at the time the expression is evaluated. As
evaluation may occur during parsing, it is just not defined if the
evaluation of the expression itself determines whether the item itself is
known to exist or not.


<xs:element name="outerArray" maxOccurs="unbounded">
      <xs:element name="innerArray" maxOccurs="unbounded">
              <xs:element name="count" type="xs:int" dfdl:inputValueCalc='{
fn:count(../..) }'/>

In the above, we see that fn:count has as argument a relative path to the
array element named "outerArray".

There are a few observations here.

1) If we define fn:count in this case to actually have anything to do with
the current number of array elements in outerArray, then we will have
tightly constrained implementations to a very sequential notion of parsing.
The notion of "current" state of the array implies an algorithm where the
number of current occurrences is changing. E.g., we would preclude an
implementation that knows the length of all outerArray elements from
parsing all the children simultaneously in parallel, or at minimum make
this quite hard to achieve because each parallel computation would have to
somehow simulate the right "current number" of occurrences.

2) The question arises of fn:count(../..) vs.
fn:count(../../../outerArray), vs. fn:count(../../../outerArray[i]) where i
is the index of the enclosing parent outerArray instance that contains this
calculation. Arguably, fn:count(../..) could be considered equivalent to
fn:count(../../../outerArray[i]), both of which seem like they should
always return '1' since the count of number of instances of a single index
point, single node, is 1.

3) Arguably, fn:count(path) could be illegal whenever the path is to an
enclosing element. We could simply define this usage to be illegal. I
cannot come up with any reason to actually need this functionality.  When
parsing we could require the path argument to be to pre-existing part of
the infoset, and when unparsing it would have to be to either pre-existing
or later parts of the infoset, but specifically not the current infoset
elements. If we make this an SDE, then this would seem to be the
conservative design point which preserves our ability to assign a future
meaning to this usage, should a need arise.

My recommendation: Expressions evaluated as part of an element parsing or
unparsing cannot refer to the count or existence of the current element
occurrence being parsed, nor any enclosing element occurrence, nor any
enclosing array element.

This would seem to rule out any use of absolute paths in arguments to
fn:count, because the root element is not (necessarily) known-to-exist
until the entire parse completes successfully. Yet clearly we want to be
able to refer to the fn:count of a prior sibling array, and that reference
should be able to use either a relative or absolute path.

So it's not that the argument path "passes through" a node that may or may
not exist, but that it ends on one that the existence or not of which
doesn't depend on the existence or not of the current node.

I'm a bit uncertain of good language to express this constraint on what the
path argument is allowed to refer to, but the notion is one of a sort of
circular definition; hence, it's a schema definition error.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20201105/299e6ef5/attachment.html>

More information about the dfdl-wg mailing list