[DFDL-WG] Action 315: fn:count(.), fn:exists(.)

Steve Lawrence slawrence at tresys.com
Thu Nov 5 16:07:40 EST 2020


I know of uses where fn:count has been used as a way to keep a running
sum via inputValueCalc. For example:

  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="array" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="val" type="xs:int" ... />
              <xs:element name="sum" type="xs:int"
                dfdl:inputValueCalc="{
                  if (fn:count(../../array) eq 1)
                  then ../val
                  else ../../array[fn:count(../../array) - 1]/sum
                }" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element name="total" type="xs:int"
          dfdl:inputValueCalc="{ ../array[fn:count(../array)]/sum }" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

Would something like this no longer be allowed under this proposal?


On 11/5/20 3:42 PM, Mike Beckerle wrote:
> There are 4 functions which operate on the infoset and it is unclear their 
> behavior depending on when they are evaluated during parse/unparse.
> 
> The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.
> 
> The behavior when unparsing is less problematic, because one could simply 
> require the infoset nodes being referenced to be fully-constructed before these 
> functions are allowed to evaluate.
> 
> However, when parsing the behavior is more subtle, and unparsing may want to be 
> made consistent with decisions about behavior for parsing.
> 
> Our call minutes about this action item suggest reviewing the known-to-exist and 
> known-not-to-exist definitions to see whether these function definitions should 
> be defined in terms of that. I have reviewed those sections, and so far I'm not 
> sure they will contribute.
> 
> The general problem is this, in terms of fn:count(path). The path is to an 
> infoset node or an array of occurrences that is currently being parsed. It is 
> possible that the status of known-to-exist or not is simply not well known at 
> the point the expression is being evaluated.
> 
> The answer to fn:count(path) wants to always be the same as if the infoset were 
> fully constructed at the time the expression is evaluated. As evaluation may 
> occur during parsing, it is just not defined if the evaluation of the expression 
> itself determines whether the item itself is known to exist or not.
> 
> Ex:
> 
> <xs:element name="outerArray" maxOccurs="unbounded">
>    <xs:complexType>
>      <xs:sequence>
>        <xs:element name="innerArray" maxOccurs="unbounded">
>          <xs:complexType>
>            <xs:sequence>
>                <xs:element name="count" type="xs:int" dfdl:inputValueCalc='{ 
> fn:count(../..) }'/>
>                ....
>            </xs:sequence>
>        </x:complexType>
>      </xs:element>
>      </xs:sequence>
>     </xs:complexTYpe>
> </xs:element>
> 
> In the above, we see that fn:count has as argument a relative path to the array 
> element named "outerArray".
> 
> There are a few observations here.
> 
> 1) If we define fn:count in this case to actually have anything to do with the 
> current number of array elements in outerArray, then we will have tightly 
> constrained implementations to a very sequential notion of parsing. The notion 
> of "current" state of the array implies an algorithm where the number of current 
> occurrences is changing. E.g., we would preclude an implementation that knows 
> the length of all outerArray elements from parsing all the children 
> simultaneously in parallel, or at minimum make this quite hard to achieve 
> because each parallel computation would have to somehow simulate the right 
> "current number" of occurrences.
> 
> 2) The question arises of fn:count(../..) vs. fn:count(../../../outerArray), vs. 
> fn:count(../../../outerArray[i]) where i is the index of the enclosing parent 
> outerArray instance that contains this calculation. Arguably, fn:count(../..) 
> could be considered equivalent to fn:count(../../../outerArray[i]), both of 
> which seem like they should always return '1' since the count of number of 
> instances of a single index point, single node, is 1.
> 
> 3) Arguably, fn:count(path) could be illegal whenever the path is to an 
> enclosing element. We could simply define this usage to be illegal. I cannot 
> come up with any reason to actually need this functionality.  When parsing we 
> could require the path argument to be to pre-existing part of the infoset, and 
> when unparsing it would have to be to either pre-existing or later parts of the 
> infoset, but specifically not the current infoset elements. If we make this an 
> SDE, then this would seem to be the conservative design point which preserves 
> our ability to assign a future meaning to this usage, should a need arise.
> 
> My recommendation: Expressions evaluated as part of an element parsing or 
> unparsing cannot refer to the count or existence of the current element 
> occurrence being parsed, nor any enclosing element occurrence, nor any enclosing 
> array element.
> 
> This would seem to rule out any use of absolute paths in arguments to fn:count, 
> because the root element is not (necessarily) known-to-exist until the entire 
> parse completes successfully. Yet clearly we want to be able to refer to the 
> fn:count of a prior sibling array, and that reference should be able to use 
> either a relative or absolute path.
> 
> So it's not that the argument path "passes through" a node that may or may not 
> exist, but that it ends on one that the existence or not of which doesn't depend 
> on the existence or not of the current node.
> 
> I'm a bit uncertain of good language to express this constraint on what the path 
> argument is allowed to refer to, but the notion is one of a sort of circular 
> definition; hence, it's a schema definition error.
> 
> 
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | 
> www.owlcyberdefense.com <http://www.owlcyberdefense.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are subject 
> to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>
> 
> 
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
> 



More information about the dfdl-wg mailing list