[DFDL-WG] when is the separator expression evaluated?

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Dec 17 13:19:06 EST 2014


Great. I concur. Anybody have the opposite perspective?

Where should a clarification go?

In general, suppose

<xs:sequence dfdl:terminator="....some expression..."> ....

does the expression get evaluated when the xs:sequence is first "entered"
by the parser (whatever "entered" means - when the parser conceptually
walks into this construct of the schema), or as late as possible - when the
terminator is actually needed for something.

Consider - parsing we may need the terminator quite soon, as the terminator
may play a role in delimiting the very first thing one finds inside the
sequence.

When unparsing, if you happen to know there are 5 things in the sequence
from the Infoset, you don't really need the terminator at all until after
you have unparsed the 5th thing, i.e., much later.

This asymetry is of concern.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Wed, Dec 17, 2014 at 10:28 AM, Tim Kimber <KIMBERT at uk.ibm.com> wrote:
>
> A separator as something that applies to the entire group, so I'm
> uncomfortable with the idea of (potentially ) changing it for every member
> of the group.
> So I would vote for:
> 1) The separator is evaluated once per 'data' element; occursIndex
> evaluates to index in the 'data' array;
>
> If 2) was desired it could be achieved by setting the terminator on num:
> <element name="e2">
>   <sequence separator="|" separatorPosition="infix">
>     <element name="seps" minOccurs="3" maxOccurs="3"/>
>     <element name="data" maxOccurs='10'>
>       <sequence>
>         <element name="num" maxOccurs='10' terminator="{
> /e2/seps[dfdl:occursIndex()] }" />
>       </sequence>
>     </element>
>   </sequence>
> </element>
> ..and the infix-ness could be emulated by setting the terminator to ""
> when dfdl:occursIndex() eq count( /e2/seps).
>
> regards,
>
> Tim Kimber,
> Technical Lead for IBM Integration Bus Healthcare Pack
> Hursley, UK
> Internet:  kimbert at uk.ibm.com
> Tel. 01962-816742
> Internal tel. 37246742
>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Cc:        Norm Patrick <npatrick at tresys.com>, Jessie Chab <
> jchab at tresys.com>
> Date:        16/12/2014 22:40
> Subject:        [DFDL-WG] when is the separator expression evaluated?
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> Jessie Chab came up with this interesting case. I am hoping someone else
> remembers somewhere in the spec where this order of evaluation issue is
> taken up in detail.
>
> Consider:
>
> <element name="e2">
>   <sequence separator="|" separatorPosition="infix">
>     <element name="seps" minOccurs="3" maxOccurs="3"/>
>     <element name="data" maxOccurs='10'>
>       <sequence separator="{ /e2/seps[dfdl:occursIndex()] }">
>         <element name="num" maxOccurs='10' />
>       </sequence>
>     </element>
>   </sequence>
> </element>
>
> So we first parse 3 strings separated by a pipe. After that's parsed,
> lets assume our infoset looks like this:
>
> <e2>
>   <seps>;</seps>
>   <seps>-</seps>
>   <seps>#</seps>
> </e2>
>
> After that we will have some 'data' elements (separated by pipes) which
> each have a sequence of 'num' elements. The question is what are the
> valid separators of the 'num' elements. I see two potential
> interpretations.
>
> 1) The separator is evaluated once per 'data' element; occursIndex
> evaluates to index in the 'data' array; valid data might look something
> like:
>
> ;|-|#|a;b;c;d|e-f-g-h|i#j#k#l
>
> Note that this means the size of the data array must be less than or
> equal to the size of the seps array (though that could be worked around
> using mod 3 arithmetic.)
>
> 2) Everytime we need to look for a separator between a num element, we
> reevaluate the separator expression. This means the occursIndex()
> references the index in the 'num' array, and so valid data might look
> something like:
>
> ;|-|#|a;b-c#d|e;f-g#h|i;j-k#l
>
> Note that this means the size of the num array must be less than or
> equal to the size of the seps array.
>
> I recall we were considering an argument to dfdl:occursIndex() to make
> exactly this kind of issue clear. I believe we decided against it, as we
> weren't able to pin down the semantics quite clearly.  E.g., in the above,
> how would you add an argument to the dfdl:occursIndex(...) call that points
> to the num array, which isn't even in scope at that point?
>
> I know we say somewhere in the spec that separator can be defined, in say,
> the default format of some other schema file. It can be an expression, and
> that expression isn't evaluated until some sequence which has that
> separator in scope. Which means the expression can refer to path steps and
> such that are meaningless at the point where it appears lexically, but will
> be meaningful for a sequence where that separator expression is in scope.
>
> But this problem is slightly different. The question is whether the
> evaluation is per-item of the sequence, or once for the sequence.
>
>
> ...mikeb
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20141217/3684c3fa/attachment.html>


More information about the dfdl-wg mailing list