[DFDL-WG] Action 315: fn:count(.), fn:exists(.)

Thu Dec 10 13:48:09 EST 2020

Action 315 ... IBM DFDL and Daffodil do not have any tests of significance 
that use self/parent, for the set of affected functions.  Proposal is to 
make such usage a SDE.  Say now if this is a problem.

Regards

Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Lawrence <slawrence at tresys.com>
Cc:     DFDL-WG <dfdl-wg at ogf.org>
Date:   06/11/2020 21:23
Subject:        [EXTERNAL] Re: [DFDL-WG] Action 315: fn:count(.), 
fn:exists(.)
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>

Yes I think that would be disallowed. 

I think dfdl:occursIndex() is the function to call to decide if you are at 
index 1 or not.

However, we only have dfdl:occursIndex() defined for the innermost array. 
There's no way to ask for the current index of an enclosing array of the 
nest.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | 
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy

On Thu, Nov 5, 2020 at 4:07 PM Steve Lawrence <slawrence at tresys.com> 
wrote:
I know of uses where fn:count has been used as a way to keep a running
sum via inputValueCalc. For example:

  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="array" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="val" type="xs:int" ... />
              <xs:element name="sum" type="xs:int"
                dfdl:inputValueCalc="{
                  if (fn:count(../../array) eq 1)
                  then ../val
                  else ../../array[fn:count(../../array) - 1]/sum
                }" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element name="total" type="xs:int"
          dfdl:inputValueCalc="{ ../array[fn:count(../array)]/sum }" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

Would something like this no longer be allowed under this proposal?

On 11/5/20 3:42 PM, Mike Beckerle wrote:
> There are 4 functions which operate on the infoset and it is unclear 
their 
> behavior depending on when they are evaluated during parse/unparse.
> 
> The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.
> 
> The behavior when unparsing is less problematic, because one could 
simply 
> require the infoset nodes being referenced to be fully-constructed 
before these 
> functions are allowed to evaluate.
> 
> However, when parsing the behavior is more subtle, and unparsing may 
want to be 
> made consistent with decisions about behavior for parsing.
> 
> Our call minutes about this action item suggest reviewing the 
known-to-exist and 
> known-not-to-exist definitions to see whether these function definitions 
should 
> be defined in terms of that. I have reviewed those sections, and so far 
I'm not 
> sure they will contribute.
> 
> The general problem is this, in terms of fn:count(path). The path is to 
an 
> infoset node or an array of occurrences that is currently being parsed. 
It is 
> possible that the status of known-to-exist or not is simply not well 
known at 
> the point the expression is being evaluated.
> 
> The answer to fn:count(path) wants to always be the same as if the 
infoset were 
> fully constructed at the time the expression is evaluated. As evaluation 
may 
> occur during parsing, it is just not defined if the evaluation of the 
expression 
> itself determines whether the item itself is known to exist or not.
> 
> Ex:
> 
> <xs:element name="outerArray" maxOccurs="unbounded">
>    <xs:complexType>
>      <xs:sequence>
>        <xs:element name="innerArray" maxOccurs="unbounded">
>          <xs:complexType>
>            <xs:sequence>
>                <xs:element name="count" type="xs:int" 
dfdl:inputValueCalc='{ 
> fn:count(../..) }'/>
>                ....
>            </xs:sequence>
>        </x:complexType>
>      </xs:element>
>      </xs:sequence>
>     </xs:complexTYpe>
> </xs:element>
> 
> In the above, we see that fn:count has as argument a relative path to 
the array 
> element named "outerArray".
> 
> There are a few observations here.
> 
> 1) If we define fn:count in this case to actually have anything to do 
with the 
> current number of array elements in outerArray, then we will have 
tightly 
> constrained implementations to a very sequential notion of parsing. The 
notion 
> of "current" state of the array implies an algorithm where the number of 
current 
> occurrences is changing. E.g., we would preclude an implementation that 
knows 
> the length of all outerArray elements from parsing all the children 
> simultaneously in parallel, or at minimum make this quite hard to 
achieve 
> because each parallel computation would have to somehow simulate the 
right 
> "current number" of occurrences.
> 
> 2) The question arises of fn:count(../..) vs. 
fn:count(../../../outerArray), vs. 
> fn:count(../../../outerArray[i]) where i is the index of the enclosing 
parent 
> outerArray instance that contains this calculation. Arguably, 
fn:count(../..) 
> could be considered equivalent to fn:count(../../../outerArray[i]), both 
of 
> which seem like they should always return '1' since the count of number 
of 
> instances of a single index point, single node, is 1.
> 
> 3) Arguably, fn:count(path) could be illegal whenever the path is to an 
> enclosing element. We could simply define this usage to be illegal. I 
cannot 
> come up with any reason to actually need this functionality.  When 
parsing we 
> could require the path argument to be to pre-existing part of the 
infoset, and 
> when unparsing it would have to be to either pre-existing or later parts 
of the 
> infoset, but specifically not the current infoset elements. If we make 
this an 
> SDE, then this would seem to be the conservative design point which 
preserves 
> our ability to assign a future meaning to this usage, should a need 
arise.
> 
> My recommendation: Expressions evaluated as part of an element parsing 
or 
> unparsing cannot refer to the count or existence of the current element 
> occurrence being parsed, nor any enclosing element occurrence, nor any 
enclosing 
> array element.
> 
> This would seem to rule out any use of absolute paths in arguments to 
fn:count, 
> because the root element is not (necessarily) known-to-exist until the 
entire 
> parse completes successfully. Yet clearly we want to be able to refer to 
the 
> fn:count of a prior sibling array, and that reference should be able to 
use 
> either a relative or absolute path.
> 
> So it's not that the argument path "passes through" a node that may or 
may not 
> exist, but that it ends on one that the existence or not of which 
doesn't depend 
> on the existence or not of the current node.
> 
> I'm a bit uncertain of good language to express this constraint on what 
the path 
> argument is allowed to refer to, but the notion is one of a sort of 
circular 
> definition; hence, it's a schema definition error.
> 
> 
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | 
> www.owlcyberdefense.com <http://www.owlcyberdefense.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are 
subject 
> to the OGF Intellectual Property Policy <
http://www.ogf.org/About/abt_policies.php>
> 
> 
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
> 

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg--
  dfdl-wg mailing list
  dfdl-wg at ogf.org

https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20201210/68306ae6/attachment-0001.html>