[DFDL-WG] Action 315: fn:count(.), fn:exists(.)

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri Nov 6 16:23:22 EST 2020


Yes I think that would be disallowed.

I think dfdl:occursIndex() is the function to call to decide if you are at
index 1 or not.

However, we only have dfdl:occursIndex() defined for the innermost array.
There's no way to ask for the current index of an enclosing array of the
nest.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Thu, Nov 5, 2020 at 4:07 PM Steve Lawrence <slawrence at tresys.com> wrote:

> I know of uses where fn:count has been used as a way to keep a running
> sum via inputValueCalc. For example:
>
>   <xs:element name="root">
>     <xs:complexType>
>       <xs:sequence>
>         <xs:element name="array" maxOccurs="unbounded">
>           <xs:complexType>
>             <xs:sequence>
>               <xs:element name="val" type="xs:int" ... />
>               <xs:element name="sum" type="xs:int"
>                 dfdl:inputValueCalc="{
>                   if (fn:count(../../array) eq 1)
>                   then ../val
>                   else ../../array[fn:count(../../array) - 1]/sum
>                 }" />
>             </xs:sequence>
>           </xs:complexType>
>         </xs:element>
>         <xs:element name="total" type="xs:int"
>           dfdl:inputValueCalc="{ ../array[fn:count(../array)]/sum }" />
>       </xs:sequence>
>     </xs:complexType>
>   </xs:element>
>
> Would something like this no longer be allowed under this proposal?
>
>
> On 11/5/20 3:42 PM, Mike Beckerle wrote:
> > There are 4 functions which operate on the infoset and it is unclear
> their
> > behavior depending on when they are evaluated during parse/unparse.
> >
> > The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.
> >
> > The behavior when unparsing is less problematic, because one could
> simply
> > require the infoset nodes being referenced to be fully-constructed
> before these
> > functions are allowed to evaluate.
> >
> > However, when parsing the behavior is more subtle, and unparsing may
> want to be
> > made consistent with decisions about behavior for parsing.
> >
> > Our call minutes about this action item suggest reviewing the
> known-to-exist and
> > known-not-to-exist definitions to see whether these function definitions
> should
> > be defined in terms of that. I have reviewed those sections, and so far
> I'm not
> > sure they will contribute.
> >
> > The general problem is this, in terms of fn:count(path). The path is to
> an
> > infoset node or an array of occurrences that is currently being parsed.
> It is
> > possible that the status of known-to-exist or not is simply not well
> known at
> > the point the expression is being evaluated.
> >
> > The answer to fn:count(path) wants to always be the same as if the
> infoset were
> > fully constructed at the time the expression is evaluated. As evaluation
> may
> > occur during parsing, it is just not defined if the evaluation of the
> expression
> > itself determines whether the item itself is known to exist or not.
> >
> > Ex:
> >
> > <xs:element name="outerArray" maxOccurs="unbounded">
> >    <xs:complexType>
> >      <xs:sequence>
> >        <xs:element name="innerArray" maxOccurs="unbounded">
> >          <xs:complexType>
> >            <xs:sequence>
> >                <xs:element name="count" type="xs:int"
> dfdl:inputValueCalc='{
> > fn:count(../..) }'/>
> >                ....
> >            </xs:sequence>
> >        </x:complexType>
> >      </xs:element>
> >      </xs:sequence>
> >     </xs:complexTYpe>
> > </xs:element>
> >
> > In the above, we see that fn:count has as argument a relative path to
> the array
> > element named "outerArray".
> >
> > There are a few observations here.
> >
> > 1) If we define fn:count in this case to actually have anything to do
> with the
> > current number of array elements in outerArray, then we will have
> tightly
> > constrained implementations to a very sequential notion of parsing. The
> notion
> > of "current" state of the array implies an algorithm where the number of
> current
> > occurrences is changing. E.g., we would preclude an implementation that
> knows
> > the length of all outerArray elements from parsing all the children
> > simultaneously in parallel, or at minimum make this quite hard to
> achieve
> > because each parallel computation would have to somehow simulate the
> right
> > "current number" of occurrences.
> >
> > 2) The question arises of fn:count(../..) vs.
> fn:count(../../../outerArray), vs.
> > fn:count(../../../outerArray[i]) where i is the index of the enclosing
> parent
> > outerArray instance that contains this calculation. Arguably,
> fn:count(../..)
> > could be considered equivalent to fn:count(../../../outerArray[i]), both
> of
> > which seem like they should always return '1' since the count of number
> of
> > instances of a single index point, single node, is 1.
> >
> > 3) Arguably, fn:count(path) could be illegal whenever the path is to an
> > enclosing element. We could simply define this usage to be illegal. I
> cannot
> > come up with any reason to actually need this functionality.  When
> parsing we
> > could require the path argument to be to pre-existing part of the
> infoset, and
> > when unparsing it would have to be to either pre-existing or later parts
> of the
> > infoset, but specifically not the current infoset elements. If we make
> this an
> > SDE, then this would seem to be the conservative design point which
> preserves
> > our ability to assign a future meaning to this usage, should a need
> arise.
> >
> > My recommendation: Expressions evaluated as part of an element parsing
> or
> > unparsing cannot refer to the count or existence of the current element
> > occurrence being parsed, nor any enclosing element occurrence, nor any
> enclosing
> > array element.
> >
> > This would seem to rule out any use of absolute paths in arguments to
> fn:count,
> > because the root element is not (necessarily) known-to-exist until the
> entire
> > parse completes successfully. Yet clearly we want to be able to refer to
> the
> > fn:count of a prior sibling array, and that reference should be able to
> use
> > either a relative or absolute path.
> >
> > So it's not that the argument path "passes through" a node that may or
> may not
> > exist, but that it ends on one that the existence or not of which
> doesn't depend
> > on the existence or not of the current node.
> >
> > I'm a bit uncertain of good language to express this constraint on what
> the path
> > argument is allowed to refer to, but the notion is one of a sort of
> circular
> > definition; hence, it's a schema definition error.
> >
> >
> > Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
> > www.owlcyberdefense.com <http://www.owlcyberdefense.com>
> > Please note: Contributions to the DFDL Workgroup's email discussions are
> subject
> > to the OGF Intellectual Property Policy <
> http://www.ogf.org/About/abt_policies.php>
> >
> >
> > --
> >   dfdl-wg mailing list
> >   dfdl-wg at ogf.org
> >   https://www.ogf.org/mailman/listinfo/dfdl-wg
> >
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20201106/1ac0c793/attachment.html>


More information about the dfdl-wg mailing list