[DFDL-WG] Consolidated Notes including both DFDL WG Call 2018-08-07 along with other recent clarification email threads

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Jul 3 09:36:20 EDT 2019


Ah. I had the sense flipped there.

I like the simpler wording you provided better.

I am not sure we capture, anywhere, the exact situations where a
potentially trailing group, with zero-length, gets no separator. This is
essential for formats like EDIFACT.
Because the language about separator suppression talks in terms of
"optional" vs. "required", and model groups have no dimension in DFDL so
these concepts don't apply.
Nevertheless that can be a separate issue if it even still exists.  At this
point I need to edit the spec, fold in all errata and changes, and then
read the resulting sections to see if they match my understanding of how
the algorithms actually have to work.

I have updated the tracker with the improved language.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Tue, Jul 2, 2019 at 4:13 AM Steve Hanson <smh at uk.ibm.com> wrote:

> Hi Mike
>
> That's not correct. Trailing suppression occurs for *positional*
> sequences. It's 'anyEmpty' that means a group is non-positional.
>
> I think this is sufficient:
>
> "Separators occur in the data either before, between or after all
> occurrences of the elements or groups that are the children of the
> sequence, in accordance with dfdl:separatorPosition and
> dfdl:separatorSuppressionPolicy. Elements with dfdl:inputValueCalc have
> no representation in the data stream, and so never have an associated
> separator."
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Steve Hanson <smh at uk.ibm.com>
> Cc:        DFDL-WG <dfdl-wg at ogf.org>
> Date:        27/06/2019 19:24
> Subject:        Re: [DFDL-WG] Consolidated Notes including both DFDL WG
> Call 2018-08-07 along with other recent clarification email threads
> ------------------------------
>
>
>
>
> Reawakening this thread from last August as I'm trying to distill this
> into tracker issues.
>
> I believe now we know that the suggestion below about Section 14.2 isn't
> correct. Pulling that discussion up to the top here, we suggested this:
>
> Section 14.2
>    For property dfdl:separator. The sentence: "Separators occur in the
> data either before, between or after all occurrences of the elements or
> groups that are the children of the sequence." replaced with "Separators
> occur in the data either before, between or after all occurrences of
> *represented* elements (that is, elements without the dfdl:inputValueCalc
> property) or model groups that are the children of the sequence. Elements
> with dfdl:inputValueCalc have no representation in the data stream, and so
> never have separators. Children of a sequence that are model groups are
> always separated, even if they are empty (meaning have no children of their
> own - which is allowed for sequence groups), or both the model group child
> and its contained children occupy zero-length in the data stream."
>
> From tests IBM did, and experience with the DFDL schema for EDIFACT, I
> think we neglected the potentially-trailing group case in the above
> description. I've revised it, with the new phrasing in blue.
>
> Section 14.2
>    For property dfdl:separator. The sentence: "Separators occur in the
> data either before, between or after all occurrences of the elements or
> groups that are the children of the sequence." replaced with "Separators
> occur in the data either before, between or after all occurrences of
> *represented* elements (that is, elements without the dfdl:inputValueCalc
> property) or model groups that are the children of the sequence. Elements
> with dfdl:inputValueCalc have no representation in the data stream, and so
> never have separators. Children of a sequence that are model groups are
> separated if the sequence is positional, even if they are empty (meaning
> have no children of their own - which is allowed for sequence groups), or
> both the model group child and its contained children occupy zero-length in
> the data stream. If the sequence is not positional, then separators are
> suppressed for trailing groups that are zero-length according to the
> dfdl:separatorSuppressionPolicy."
>
> This accommodates the common situation where a trailing sequence group
> contains an entirely optional array element. If none of the array elements
> exists we do not want a separator for the sequence group at all.
>
> If this makes sense, I will distill these to one or more tracker items.
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
>
> On Wed, Aug 15, 2018 at 11:51 AM Mike Beckerle <*mbeckerle.dfdl at gmail.com*
> <mbeckerle.dfdl at gmail.com>> wrote:
> I'm fine with your suggested revised wording. The point is just to make a
> broader statement about empty representation than the one there which
> suggests it is *only* used for deciding default values to be used or not,
> when it is in fact used more broadly to determine two different things
> about absent/missing - defaulting, and optional-element occurrence.
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
>
> On Wed, Aug 15, 2018 at 10:12 AM Steve Hanson <*smh at uk.ibm.com*
> <smh at uk.ibm.com>> wrote:
> Mike, thanks for writing this up.
>
> I am in the process of incorporating it into the minutes of the WG call.
> There is one paragraph that on re-reading and comparing with what is in the
> spec already, I am not so sure about.
>
> Section 9.2.2
>     The sentence:  "The *empty representation* is special in DFDL,
> because when parsing it is this condition that can trigger the creation of
> a default value for an element occurrence." replace with: "The empty
> representation is special in DFDL because when parsing it it is used to
> determine when default values are created in the Infoset, and when optional
> recurring elements are omitted from the Infoset. The empty representation
> can require initiators or terminators be present so as to enable data
> formats to explicitly distinguish empty-string/hexBinary values (which
> might cause default values to be used) from emptiness meaning the absence
> of any representation."
>      (This is to clarify an error of omission - prior language suggested
> that EVDP is only relevant when the element has a default value, because
> only that need was mentioned.)
>
> What is the significance of 'optionally recurring elements'? I would have
> thought that is just 'optional occurrences' as it applies to (0,1) elements
> too.
>
> Actually I'm not convinced that clause is needed at all. The point of this
> paragraph is to call out defaulting. Not adding occurrences to the infoset
> happens for absent and missing occurrences too, as described in later
> paragraphs.
>
> I would prefer:
>
> "The empty representation is special in DFDL because when parsing it is
> used to determine when default values are created in the Infoset. The empty
> representation can require initiators or terminators be present so as to
>  enable data formats to explicitly distinguish occurrences with empty
> string/hexBinary values from occurrences that are missing or are absent."
>
> Regards
>
> Steve Hanson
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Mike Beckerle <*mbeckerle.dfdl at gmail.com*
> <mbeckerle.dfdl at gmail.com>>
> To:        DFDL-WG <*dfdl-wg at ogf.org* <dfdl-wg at ogf.org>>
> Date:        07/08/2018 18:46
> Subject:        [DFDL-WG] Consolidated Notes including both DFDL WG Call
> 2018-08-07 along with other recent clarification email threads
> Sent by:        "dfdl-wg" <*dfdl-wg-bounces at ogf.org*
> <dfdl-wg-bounces at ogf.org>>
> ------------------------------
>
>
>
> This message includes two parts.
>
> Part 1 is the things we discussed on the DFDL WG Call of 2018-08-07.
> Part 2 is the other recent emails where the conclusion from the email
> thread is repeated here for finalization/refinement, and for consolidation
> so all these changes can be considered together.
>
> ================================
>
> Part 1 - Discussed on the call.
> * Re: [DFDL-WG] Clarification discussion points for next call(s) - DFDL
> Spec issues around nil, empty, normal, absent, defaulting, and separator
> suppression.*
> Dated August 2nd
>
> Conclusions:
>
> Section 9.3.1.1
>    Delete phrase "...this of course implies that....". Note that there is
> already a correction to create numbered bullets of 3 sentences. The
> sentence containing this phrase will be bullet #3 of that list.
>
> Section 9.4
>    Item 2 under "For elements and element refs:" Change to: "dfdl:element
> following property scoping rules, which includes establishing
> representation as described in Section 9.3.2 and conversion to element type
> for simple types."
>
> Section 9.3.2
>   The phrase "The first step is to see if the content is trivaill of
> length zero." Change to: "The first step is to see if the SimpleContent or
> ComplexContnet region is of length zero as a first approximation."
>   The bullet "delimited => length is zero (delimiter is immediately
> encountered)" Insert "in scope" after the open parenthesis.
>
> Section 9.4.2.3.
>   We agreed that the paragraphs beginning with "For both required and
> optional..." need to be better tied to the material above. Wording TBD -
> pending Steve Hanson doing some tests on IBM DFDL.
>
> ============================
>
> Part 2 - Below are conclusions from the prior email threads. This is for
> email review in lieu of discussion on this week's call.
> * Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we
> keep variable assignments?*
> Of Aug 6 (last date in the thread)
>
> These corrections apply:
>
> Sections 9.4.2.2 and 9.4.2.3
>    The phrase "Optional occurrence: If dfdl:emptyValueDelimiterPolicy is
> not 'none'*[12]* <http://daffodil.apache.org/docs/dfdl/#_ftn12>,"  Change
> to "Optional occurrence: if dfdl:emptyValueDelimiterPolicy is applicable
> and is not 'none',...." (retaining the footnote)
>
> Section 9.4.2
>    Before the final phrase "There are three main cases to consider:"
> Insert this sentence: "The sections below indicate when an item is added to
> the infoset, and whether it has a default or other value. If there is no
> processing error then regardless of whether an item is added to the infoset
> or not, any side-effects due to dfdl:discriminator statements evaluating to
> true, or dfdl:setVariable statements, are retained."
>
> Section 12.2
>    For property emptyValueDelimiterPolicy, before the phrase "It is a
> schema definition error if...", insert this sentence: "The value of
> dfdl:emptyValueDelimiterPolicy  should only be checked if there is a
> dfdl:initiator or dfdl:terminator in scope. If so, and
> dfdl:emptyValueDelimiterPolicy is not set, it is a schema definition error.
> If dfdl:initiator is not "" and dfdl:terminator is "" and
> dfdl:emptyValueDelimiterPolicy is 'terminator' it is a schema definition
> error. If dfdl:terminator is not "" and dfdl:initiator is " and
> dfdl:emptyValueDelimiterPolicy  is 'initiator' it is a schema definition
> error."
>
> Section 13.16
>     For property nilValueDelimiterPolicy, before the phrase "It is a
> schema definition error if...", insert this sentence: "The value of
> dfdl:nilValueDelimiterPolicy  should only be checked if there is a
> dfdl:initiator or dfdl:terminator in scope. If so, and
> dfdl:nilValueDelimiterPolicy is not set, it is a schema definition error.
> If dfdl:initiator is not "" and dfdl:terminator is "" and
> dfdl:nilValueDelimiterPolicy is 'terminator' it is a schema definition
> error. If dfdl:terminator is not "" and dfdl:initiator is " and
> dfdl:nilValueDelimiterPolicy  is 'initiator' it is a schema definition error
> ."
>
> Section 9.2.2
>     The phrase  "the occurrence's content in the data..." replace with
> "the occurrence's SimpleContent or ComplexContent region in the data..."
>     The sentence:  "The *empty representation* is special in DFDL,
> because when parsing it is this condition that can trigger the creation of
> a default value for an element occurrence." replace with: "The empty
> representation is special in DFDL because when parsing it it is used to
> determine when default values are created in the Infoset, and when optional
> recurring elements are omitted from the Infoset. The empty representation
> can require initiators or terminators be present so as to enable data
> formats to explicitly distinguish empty-string/hexBinary values (which
> might cause default values to be used) from emptiness meaning the absence
> of any representation."
>      (This is to clarify an error of omission - prior language suggested
> that EVDP is only relevant when the element has a default value, because
> only that need was mentioned.)
>
> * Re: [DFDL-WG] Clarification needed: separator for empty sequence*
> Of Aug 2
>
> Section 14.2
>    For property dfdl:separator. The sentence: "Separators occur in the
> data either before, between or after all occurrences of the elements or
> groups that are the children of the sequence." replaced with "Separators
> occur in the data either before, between or after all occurrences of
> *represented* elements (that is, elements without the dfdl:inputValueCalc
> property) or model groups that are the children of the sequence. Elements
> with dfdl:inputValueCalc have no representation in the data stream, and so
> never have separators. Children of a sequence that are model groups are
> always separated, even if they are empty (meaning have no children of their
> own - which is allowed for sequence groups), or both the model group child
> and its contained children occupy zero-length in the data stream."
>    (note: Some of the above is redundant with stipulations in the
> dfdl:inputValueCalc property description, but I believe it is wise to have
> this little redundancy.)
>
> ======================
>
> These email threads are mentioned here to indicate that they are resolved
> by one or another of the above corrections:
>
> * Re: [DFDL-WG] clarification needed - ambiguity about empty string and
> optional element*
> Of Aug 2
> * Re: [DFDL-WG] Spec correction ? - Section 9.3.2.1 - second list missing
> "empty" representation*
> Of Aug 2
>
> -----------
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>  *https://www.ogf.org/mailman/listinfo/dfdl-wg*
> <https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190703/0701764f/attachment-0001.html>


More information about the dfdl-wg mailing list