[DFDL-WG] Action 277: when is the separator expression evaluated?

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Jan 29 16:54:36 EST 2015


This resolution to the matter (no change required as 16.3 is clear enough),
was presented to the Daffodil team that raised the original issue on
2015-01-08 and there is no further comment or discussion.

I believe we can close this action.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Mon, Jan 19, 2015 at 1:01 PM, Steve Hanson <smh at uk.ibm.com> wrote:

> Mike to run the below past the Daffodil team.
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> ----- Forwarded by Steve Hanson/UK/IBM on 19/01/2015 17:58 -----
>
> From:        Steve Hanson/UK/IBM
> To:        Tim Kimber/UK/IBM at IBMGB
> Cc:        DFDL-WG <dfdl-wg at ogf.org>
> Date:        19/12/2014 11:51
> Subject:        Re: [DFDL-WG] when is the separator expression evaluated?
> ------------------------------
>
>
> Section 16.3 of the spec says:
>
> *16.3        Arrays with DFDL Expressions*
>
> *If the value of a DFDL property of an array element (other than
> dfdl:occursCount) is given by a DFDL Expression, then the expression must
> be re-evaluated for each occurrence of the element in case the value
> changes. *
>
>
> Relating this to Mike's original question, I would say that the separator
> for the sequence within 'data' is re-evaluated for each *occurrence* of
> 'data', but it is not re-evaluated for each *occurrence* of 'num'.
>
> The order in which properties are referenced is given by section 22 of the
> spec. (I am sure this does not cover every nuance, but let's assume it
> does). It should not make a difference if a property is fixed or an
> expression; so when a property is referenced the expression is evaluated. I
> am happy for implementations to defer the evaluation of the expression BUT
> only as long as deferral does not change the result that would have been
> obtained if the expression had been evaluated at the original time of
> reference.
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Tim Kimber/UK/IBM at IBMGB
> To:        DFDL-WG <dfdl-wg at ogf.org>
> Date:        18/12/2014 16:24
> Subject:        Re: [DFDL-WG] when is the separator expression evaluated?
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> Good questions. I think the questions apply equally to separator and
> terminator, which can both be defined on the sequence group.
>
> Parsing: The first member may have lengthKind='explicit' and will
> therefore not need the separator until the parsing of the first member is
> complete. The terminator will be required as soon as the parser has to look
> for delimiters in the 'trailing optional' area of the sequence group.
>
> So we need to decide whether
> a) DFDL expressions for separators/terminators are evaluated upon entering
> the sequence group or
> b) DFDL expressions for separators/terminators can be evaluated lazily or
> c) DFDL expressions for separators/terminators must be evaluated lazily
>
> Serializing: The separator will be required after the first member,
> regardless. The terminator may be required before the end of the group if
> one or more group members have an escape scheme.
>
> I'm inclined to suggest that implementations should be free to evaluate
> eagerly or lazily, as long as the behaviour conforms to the DFDL spec. But
> there may be scope for conforming implementations to exhibit material
> differences in behaviour if we allow that much latitude. I just can't think
> what those differences would be.
>
> regards,
>
> Tim Kimber,
> Technical Lead for IBM Integration Bus Healthcare Pack
> Hursley, UK
> Internet:  kimbert at uk.ibm.com
> Tel. 01962-816742
> Internal tel. 37246742
>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Tim Kimber/UK/IBM at IBMGB
> Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        17/12/2014 18:19
> Subject:        Re: [DFDL-WG] when is the separator expression evaluated?
>  ------------------------------
>
>
>
> Great. I concur. Anybody have the opposite perspective?
>
> Where should a clarification go?
>
> In general, suppose
>
> <xs:sequence dfdl:terminator="....some expression..."> ....
>
> does the expression get evaluated when the xs:sequence is first "entered"
> by the parser (whatever "entered" means - when the parser conceptually
> walks into this construct of the schema), or as late as possible - when the
> terminator is actually needed for something.
>
> Consider - parsing we may need the terminator quite soon, as the
> terminator may play a role in delimiting the very first thing one finds
> inside the sequence.
>
> When unparsing, if you happen to know there are 5 things in the sequence
> from the Infoset, you don't really need the terminator at all until after
> you have unparsed the 5th thing, i.e., much later.
>
> This asymetry is of concern.
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
> On Wed, Dec 17, 2014 at 10:28 AM, Tim Kimber <*KIMBERT at uk.ibm.com*
> <KIMBERT at uk.ibm.com>> wrote:
> A separator as something that applies to the entire group, so I'm
> uncomfortable with the idea of (potentially ) changing it for every member
> of the group.
> So I would vote for:
> 1) The separator is evaluated once per 'data' element; occursIndex
> evaluates to index in the 'data' array;
>
> If 2) was desired it could be achieved by setting the terminator on num:
> <element name="e2">
>  <sequence separator="|" separatorPosition="infix">
>    <element name="seps" minOccurs="3" maxOccurs="3"/>
>    <element name="data" maxOccurs='10'>
>      <sequence>
>        <element name="num" maxOccurs='10' terminator="{
> /e2/seps[dfdl:occursIndex()] }" />
>      </sequence>
>    </element>
>  </sequence>
> </element>
> ..and the infix-ness could be emulated by setting the terminator to ""
> when dfdl:occursIndex() eq count( /e2/seps).
>
> regards,
>
> Tim Kimber,
> Technical Lead for IBM Integration Bus Healthcare Pack
> Hursley, UK
> Internet:  *kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 37246742
>
>
>
>
> From:        Mike Beckerle <*mbeckerle.dfdl at gmail.com*
> <mbeckerle.dfdl at gmail.com>>
> To:        "*dfdl-wg at ogf.org* <dfdl-wg at ogf.org>" <*dfdl-wg at ogf.org*
> <dfdl-wg at ogf.org>>
> Cc:        Norm Patrick <*npatrick at tresys.com* <npatrick at tresys.com>>,
> Jessie Chab <*jchab at tresys.com* <jchab at tresys.com>>
> Date:        16/12/2014 22:40
> Subject:        [DFDL-WG] when is the separator expression evaluated?
> Sent by:        *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
>  ------------------------------
>
>
>
> Jessie Chab came up with this interesting case. I am hoping someone else
> remembers somewhere in the spec where this order of evaluation issue is
> taken up in detail.
>
> Consider:
>
> <element name="e2">
>  <sequence separator="|" separatorPosition="infix">
>    <element name="seps" minOccurs="3" maxOccurs="3"/>
>    <element name="data" maxOccurs='10'>
>      <sequence separator="{ /e2/seps[dfdl:occursIndex()] }">
>        <element name="num" maxOccurs='10' />
>      </sequence>
>    </element>
>  </sequence>
> </element>
>
> So we first parse 3 strings separated by a pipe. After that's parsed,
> lets assume our infoset looks like this:
>
> <e2>
>  <seps>;</seps>
>  <seps>-</seps>
>  <seps>#</seps>
> </e2>
>
> After that we will have some 'data' elements (separated by pipes) which
> each have a sequence of 'num' elements. The question is what are the
> valid separators of the 'num' elements. I see two potential
> interpretations.
>
> 1) The separator is evaluated once per 'data' element; occursIndex
> evaluates to index in the 'data' array; valid data might look something
> like:
>
> ;|-|#|a;b;c;d|e-f-g-h|i#j#k#l
>
> Note that this means the size of the data array must be less than or
> equal to the size of the seps array (though that could be worked around
> using mod 3 arithmetic.)
>
> 2) Everytime we need to look for a separator between a num element, we
> reevaluate the separator expression. This means the occursIndex()
> references the index in the 'num' array, and so valid data might look
> something like:
>
> ;|-|#|a;b-c#d|e;f-g#h|i;j-k#l
>
> Note that this means the size of the num array must be less than or
> equal to the size of the seps array.
>
> I recall we were considering an argument to dfdl:occursIndex() to make
> exactly this kind of issue clear. I believe we decided against it, as we
> weren't able to pin down the semantics quite clearly.  E.g., in the above,
> how would you add an argument to the dfdl:occursIndex(...) call that points
> to the num array, which isn't even in scope at that point?
>
> I know we say somewhere in the spec that separator can be defined, in say,
> the default format of some other schema file. It can be an expression, and
> that expression isn't evaluated until some sequence which has that
> separator in scope. Which means the expression can refer to path steps and
> such that are meaningless at the point where it appears lexically, but will
> be meaningful for a sequence where that separator expression is in scope.
>
> But this problem is slightly different. The question is whether the
> evaluation is per-item of the sequence, or once for the sequence.
>
>
> ...mikeb
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
> dfdl-wg mailing list
> *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
> *https://www.ogf.org/mailman/listinfo/dfdl-wg*
> <https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>  dfdl-wg mailing list
>  *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>  *https://www.ogf.org/mailman/listinfo/dfdl-wg*
> <https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150129/799aad6b/attachment-0001.html>


More information about the dfdl-wg mailing list