[DFDL-WG] Action 315: fn:count(.), fn:exists(.)
Steve Hanson
smh at uk.ibm.com
Thu Dec 10 13:48:09 EST 2020
Action 315 ... IBM DFDL and Daffodil do not have any tests of significance
that use self/parent, for the set of affected functions. Proposal is to
make such usage a SDE. Say now if this is a problem.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Lawrence <slawrence at tresys.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 06/11/2020 21:23
Subject: [EXTERNAL] Re: [DFDL-WG] Action 315: fn:count(.),
fn:exists(.)
Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
Yes I think that would be disallowed.
I think dfdl:occursIndex() is the function to call to decide if you are at
index 1 or not.
However, we only have dfdl:occursIndex() defined for the innermost array.
There's no way to ask for the current index of an enclosing array of the
nest.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Thu, Nov 5, 2020 at 4:07 PM Steve Lawrence <slawrence at tresys.com>
wrote:
I know of uses where fn:count has been used as a way to keep a running
sum via inputValueCalc. For example:
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="array" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="val" type="xs:int" ... />
<xs:element name="sum" type="xs:int"
dfdl:inputValueCalc="{
if (fn:count(../../array) eq 1)
then ../val
else ../../array[fn:count(../../array) - 1]/sum
}" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="total" type="xs:int"
dfdl:inputValueCalc="{ ../array[fn:count(../array)]/sum }" />
</xs:sequence>
</xs:complexType>
</xs:element>
Would something like this no longer be allowed under this proposal?
On 11/5/20 3:42 PM, Mike Beckerle wrote:
> There are 4 functions which operate on the infoset and it is unclear
their
> behavior depending on when they are evaluated during parse/unparse.
>
> The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.
>
> The behavior when unparsing is less problematic, because one could
simply
> require the infoset nodes being referenced to be fully-constructed
before these
> functions are allowed to evaluate.
>
> However, when parsing the behavior is more subtle, and unparsing may
want to be
> made consistent with decisions about behavior for parsing.
>
> Our call minutes about this action item suggest reviewing the
known-to-exist and
> known-not-to-exist definitions to see whether these function definitions
should
> be defined in terms of that. I have reviewed those sections, and so far
I'm not
> sure they will contribute.
>
> The general problem is this, in terms of fn:count(path). The path is to
an
> infoset node or an array of occurrences that is currently being parsed.
It is
> possible that the status of known-to-exist or not is simply not well
known at
> the point the expression is being evaluated.
>
> The answer to fn:count(path) wants to always be the same as if the
infoset were
> fully constructed at the time the expression is evaluated. As evaluation
may
> occur during parsing, it is just not defined if the evaluation of the
expression
> itself determines whether the item itself is known to exist or not.
>
> Ex:
>
> <xs:element name="outerArray" maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="innerArray" maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="count" type="xs:int"
dfdl:inputValueCalc='{
> fn:count(../..) }'/>
> ....
> </xs:sequence>
> </x:complexType>
> </xs:element>
> </xs:sequence>
> </xs:complexTYpe>
> </xs:element>
>
> In the above, we see that fn:count has as argument a relative path to
the array
> element named "outerArray".
>
> There are a few observations here.
>
> 1) If we define fn:count in this case to actually have anything to do
with the
> current number of array elements in outerArray, then we will have
tightly
> constrained implementations to a very sequential notion of parsing. The
notion
> of "current" state of the array implies an algorithm where the number of
current
> occurrences is changing. E.g., we would preclude an implementation that
knows
> the length of all outerArray elements from parsing all the children
> simultaneously in parallel, or at minimum make this quite hard to
achieve
> because each parallel computation would have to somehow simulate the
right
> "current number" of occurrences.
>
> 2) The question arises of fn:count(../..) vs.
fn:count(../../../outerArray), vs.
> fn:count(../../../outerArray[i]) where i is the index of the enclosing
parent
> outerArray instance that contains this calculation. Arguably,
fn:count(../..)
> could be considered equivalent to fn:count(../../../outerArray[i]), both
of
> which seem like they should always return '1' since the count of number
of
> instances of a single index point, single node, is 1.
>
> 3) Arguably, fn:count(path) could be illegal whenever the path is to an
> enclosing element. We could simply define this usage to be illegal. I
cannot
> come up with any reason to actually need this functionality. When
parsing we
> could require the path argument to be to pre-existing part of the
infoset, and
> when unparsing it would have to be to either pre-existing or later parts
of the
> infoset, but specifically not the current infoset elements. If we make
this an
> SDE, then this would seem to be the conservative design point which
preserves
> our ability to assign a future meaning to this usage, should a need
arise.
>
> My recommendation: Expressions evaluated as part of an element parsing
or
> unparsing cannot refer to the count or existence of the current element
> occurrence being parsed, nor any enclosing element occurrence, nor any
enclosing
> array element.
>
> This would seem to rule out any use of absolute paths in arguments to
fn:count,
> because the root element is not (necessarily) known-to-exist until the
entire
> parse completes successfully. Yet clearly we want to be able to refer to
the
> fn:count of a prior sibling array, and that reference should be able to
use
> either a relative or absolute path.
>
> So it's not that the argument path "passes through" a node that may or
may not
> exist, but that it ends on one that the existence or not of which
doesn't depend
> on the existence or not of the current node.
>
> I'm a bit uncertain of good language to express this constraint on what
the path
> argument is allowed to refer to, but the notion is one of a sort of
circular
> definition; hence, it's a schema definition error.
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
> www.owlcyberdefense.com <http://www.owlcyberdefense.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
subject
> to the OGF Intellectual Property Policy <
http://www.ogf.org/About/abt_policies.php>
>
>
> --
> dfdl-wg mailing list
> dfdl-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
>
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20201210/68306ae6/attachment-0001.html>
More information about the dfdl-wg
mailing list