[DFDL-WG] Consolidated Notes including both DFDL WG Call 2018-08-07 along with other recent clarification email threads
Steve Hanson
smh at uk.ibm.com
Tue Jul 2 04:13:31 EDT 2019
Hi Mike
That's not correct. Trailing suppression occurs for positional sequences.
It's 'anyEmpty' that means a group is non-positional.
I think this is sufficient:
"Separators occur in the data either before, between or after all
occurrences of the elements or groups that are the children of the
sequence, in accordance with dfdl:separatorPosition and
dfdl:separatorSuppressionPolicy. Elements with dfdl:inputValueCalc have no
representation in the data stream, and so never have an associated
separator."
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson <smh at uk.ibm.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 27/06/2019 19:24
Subject: Re: [DFDL-WG] Consolidated Notes including both DFDL WG
Call 2018-08-07 along with other recent clarification email threads
Reawakening this thread from last August as I'm trying to distill this
into tracker issues.
I believe now we know that the suggestion below about Section 14.2 isn't
correct. Pulling that discussion up to the top here, we suggested this:
Section 14.2
For property dfdl:separator. The sentence: "Separators occur in the
data either before, between or after all occurrences of the elements or
groups that are the children of the sequence." replaced with "Separators
occur in the data either before, between or after all occurrences of
represented elements (that is, elements without the dfdl:inputValueCalc
property) or model groups that are the children of the sequence. Elements
with dfdl:inputValueCalc have no representation in the data stream, and so
never have separators. Children of a sequence that are model groups are
always separated, even if they are empty (meaning have no children of
their own - which is allowed for sequence groups), or both the model group
child and its contained children occupy zero-length in the data stream."
>From tests IBM did, and experience with the DFDL schema for EDIFACT, I
think we neglected the potentially-trailing group case in the above
description. I've revised it, with the new phrasing in blue.
Section 14.2
For property dfdl:separator. The sentence: "Separators occur in the
data either before, between or after all occurrences of the elements or
groups that are the children of the sequence." replaced with "Separators
occur in the data either before, between or after all occurrences of
represented elements (that is, elements without the dfdl:inputValueCalc
property) or model groups that are the children of the sequence. Elements
with dfdl:inputValueCalc have no representation in the data stream, and so
never have separators. Children of a sequence that are model groups are
separated if the sequence is positional, even if they are empty (meaning
have no children of their own - which is allowed for sequence groups), or
both the model group child and its contained children occupy zero-length
in the data stream. If the sequence is not positional, then separators are
suppressed for trailing groups that are zero-length according to the
dfdl:separatorSuppressionPolicy."
This accommodates the common situation where a trailing sequence group
contains an entirely optional array element. If none of the array elements
exists we do not want a separator for the sequence group at all.
If this makes sense, I will distill these to one or more tracker items.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Wed, Aug 15, 2018 at 11:51 AM Mike Beckerle <mbeckerle.dfdl at gmail.com>
wrote:
I'm fine with your suggested revised wording. The point is just to make a
broader statement about empty representation than the one there which
suggests it is *only* used for deciding default values to be used or not,
when it is in fact used more broadly to determine two different things
about absent/missing - defaulting, and optional-element occurrence.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Wed, Aug 15, 2018 at 10:12 AM Steve Hanson <smh at uk.ibm.com> wrote:
Mike, thanks for writing this up.
I am in the process of incorporating it into the minutes of the WG call.
There is one paragraph that on re-reading and comparing with what is in
the spec already, I am not so sure about.
Section 9.2.2
The sentence: "The empty representation is special in DFDL, because
when parsing it is this condition that can trigger the creation of a
default value for an element occurrence." replace with: "The empty
representation is special in DFDL because when parsing it it is used to
determine when default values are created in the Infoset, and when
optional recurring elements are omitted from the Infoset. The empty
representation can require initiators or terminators be present so as to
enable data formats to explicitly distinguish empty-string/hexBinary
values (which might cause default values to be used) from emptiness
meaning the absence of any representation."
(This is to clarify an error of omission - prior language suggested
that EVDP is only relevant when the element has a default value, because
only that need was mentioned.)
What is the significance of 'optionally recurring elements'? I would have
thought that is just 'optional occurrences' as it applies to (0,1)
elements too.
Actually I'm not convinced that clause is needed at all. The point of this
paragraph is to call out defaulting. Not adding occurrences to the infoset
happens for absent and missing occurrences too, as described in later
paragraphs.
I would prefer:
"The empty representation is special in DFDL because when parsing it is
used to determine when default values are created in the Infoset. The
empty representation can require initiators or terminators be present so
as to
enable data formats to explicitly distinguish occurrences with empty
string/hexBinary values from occurrences that are missing or are absent."
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: DFDL-WG <dfdl-wg at ogf.org>
Date: 07/08/2018 18:46
Subject: [DFDL-WG] Consolidated Notes including both DFDL WG Call
2018-08-07 along with other recent clarification email threads
Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
This message includes two parts.
Part 1 is the things we discussed on the DFDL WG Call of 2018-08-07.
Part 2 is the other recent emails where the conclusion from the email
thread is repeated here for finalization/refinement, and for consolidation
so all these changes can be considered together.
================================
Part 1 - Discussed on the call.
Re: [DFDL-WG] Clarification discussion points for next call(s) - DFDL Spec
issues around nil, empty, normal, absent, defaulting, and separator
suppression.
Dated August 2nd
Conclusions:
Section 9.3.1.1
Delete phrase "...this of course implies that....". Note that there is
already a correction to create numbered bullets of 3 sentences. The
sentence containing this phrase will be bullet #3 of that list.
Section 9.4
Item 2 under "For elements and element refs:" Change to: "dfdl:element
following property scoping rules, which includes establishing
representation as described in Section 9.3.2 and conversion to element
type for simple types."
Section 9.3.2
The phrase "The first step is to see if the content is trivaill of
length zero." Change to: "The first step is to see if the SimpleContent or
ComplexContnet region is of length zero as a first approximation."
The bullet "delimited => length is zero (delimiter is immediately
encountered)" Insert "in scope" after the open parenthesis.
Section 9.4.2.3.
We agreed that the paragraphs beginning with "For both required and
optional..." need to be better tied to the material above. Wording TBD -
pending Steve Hanson doing some tests on IBM DFDL.
============================
Part 2 - Below are conclusions from the prior email threads. This is for
email review in lieu of discussion on this week's call.
Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we
keep variable assignments?
Of Aug 6 (last date in the thread)
These corrections apply:
Sections 9.4.2.2 and 9.4.2.3
The phrase "Optional occurrence: If dfdl:emptyValueDelimiterPolicy is
not 'none'[12]," Change to "Optional occurrence: if
dfdl:emptyValueDelimiterPolicy is applicable and is not 'none',...."
(retaining the footnote)
Section 9.4.2
Before the final phrase "There are three main cases to consider:"
Insert this sentence: "The sections below indicate when an item is added
to the infoset, and whether it has a default or other value. If there is
no processing error then regardless of whether an item is added to the
infoset or not, any side-effects due to dfdl:discriminator statements
evaluating to true, or dfdl:setVariable statements, are retained."
Section 12.2
For property emptyValueDelimiterPolicy, before the phrase "It is a
schema definition error if...", insert this sentence: "The value of
dfdl:emptyValueDelimiterPolicy should only be checked if there is a
dfdl:initiator or dfdl:terminator in scope. If so, and
dfdl:emptyValueDelimiterPolicy is not set, it is a schema definition
error. If dfdl:initiator is not "" and dfdl:terminator is "" and
dfdl:emptyValueDelimiterPolicy is 'terminator' it is a schema definition
error. If dfdl:terminator is not "" and dfdl:initiator is " and
dfdl:emptyValueDelimiterPolicy is 'initiator' it is a schema definition
error."
Section 13.16
For property nilValueDelimiterPolicy, before the phrase "It is a
schema definition error if...", insert this sentence: "The value of
dfdl:nilValueDelimiterPolicy should only be checked if there is a
dfdl:initiator or dfdl:terminator in scope. If so, and
dfdl:nilValueDelimiterPolicy is not set, it is a schema definition error.
If dfdl:initiator is not "" and dfdl:terminator is "" and
dfdl:nilValueDelimiterPolicy is 'terminator' it is a schema definition
error. If dfdl:terminator is not "" and dfdl:initiator is " and
dfdl:nilValueDelimiterPolicy is 'initiator' it is a schema definition
error."
Section 9.2.2
The phrase "the occurrence's content in the data..." replace with
"the occurrence's SimpleContent or ComplexContent region in the data..."
The sentence: "The empty representation is special in DFDL, because
when parsing it is this condition that can trigger the creation of a
default value for an element occurrence." replace with: "The empty
representation is special in DFDL because when parsing it it is used to
determine when default values are created in the Infoset, and when
optional recurring elements are omitted from the Infoset. The empty
representation can require initiators or terminators be present so as to
enable data formats to explicitly distinguish empty-string/hexBinary
values (which might cause default values to be used) from emptiness
meaning the absence of any representation."
(This is to clarify an error of omission - prior language suggested
that EVDP is only relevant when the element has a default value, because
only that need was mentioned.)
Re: [DFDL-WG] Clarification needed: separator for empty sequence
Of Aug 2
Section 14.2
For property dfdl:separator. The sentence: "Separators occur in the
data either before, between or after all occurrences of the elements or
groups that are the children of the sequence." replaced with "Separators
occur in the data either before, between or after all occurrences of
represented elements (that is, elements without the dfdl:inputValueCalc
property) or model groups that are the children of the sequence. Elements
with dfdl:inputValueCalc have no representation in the data stream, and so
never have separators. Children of a sequence that are model groups are
always separated, even if they are empty (meaning have no children of
their own - which is allowed for sequence groups), or both the model group
child and its contained children occupy zero-length in the data stream."
(note: Some of the above is redundant with stipulations in the
dfdl:inputValueCalc property description, but I believe it is wise to have
this little redundancy.)
======================
These email threads are mentioned here to indicate that they are resolved
by one or another of the above corrections:
Re: [DFDL-WG] clarification needed - ambiguity about empty string and
optional element
Of Aug 2
Re: [DFDL-WG] Spec correction ? - Section 9.3.2.1 - second list missing
"empty" representation
Of Aug 2
-----------
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190702/69b8644d/attachment-0001.html>
More information about the dfdl-wg
mailing list