[DFDL-WG] Fw: Fw: Action 260
Steve Hanson
smh at uk.ibm.com
Tue Nov 25 13:38:57 EST 2014
http://redmine.ogf.org/issues/243
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 19/11/2014 10:55
Subject: Re: [DFDL-WG] Fw: Fw: Action 260
Agreed on the WG call that the behaviour in the table below will be
adopted including the restrictions for OCK 'parsed'.
List of spec changes as a result of this action are below. Note that the
behaviour for OCK 'stopValue' in the table was not correct for unparsing.
SSP 'anyEmpty' has a different behaviour than the others - it causes
zero-length occurrences not to be output - which is fine. Please review
so we can close on next call.
1) 14.2 - updates to paragraphs that describe positional and
non-positional sequences
Positional sequence - Each occurrence in the sequence can be identified by
its position in the data. Typically the components of such a sequence do
not have an initiator. In some such sequences, the separators for optional
zero-length occurrences may or must be omitted when at the end of the
group. In DFDL, a sequence is considered positional if it contains only
required elements and/or optional and array elements that have
dfdl:occursCountKind 'implicit', 'fixed' or 'expression', and it has
dfdl:separatorSuppressionPolicy 'never', 'trailingEmptyStrict' or
'trailingEmpty'.
Non-positional sequence - Occurrences in the sequence cannot be identified
by their position in the data alone. Often the components of such a
sequence have an initiator. Such sequences sometimes allow the separator
to be omitted for optional zero-length occurrences anywhere in the
sequence. Speculative parsing might need to be employed by to identify
each occurrence. In DFDL, a sequence is non-positional if it contains any
optional or array elements that have dfdl:occursCountKind 'parsed' or
'stopValue', and/or it has dfdl:separatorSuppressionPolicy 'anyEmpty'.
2) 14.2.2 - updates to the last sentences of the 'When
dfdlk:occursCountKind is ...' paragraphs to match the table and notes
below.
When an element is required and is not an array then one occurrence is
always expected along with its separator. The
dfdl:separatorSuppressionPolicy of the sequence has no effect (nothing is
eligible for suppression).
Otherwise the behaviour is dependent on dfdl:occursCountKind.
When dfdl:occursCountKind is 'fixed' then XSDL minOccurs must equal
maxOccurs and that many occurrences are always expected along with their
separators. The dfdl:separatorSuppressionPolicy of the sequence has no
effect (nothing is eligible for suppression).
When dfdl:occursCountKind is 'expression' the number of occurrences is
given by dfdl:occursCount and exactly that many occurrences are always
expected along with their separators. The dfdl:separatorSuppressionPolicy
of the sequence has no effect (nothing is eligible for suppression).
When dfdl:occursCountKind is 'parsed' any number of occurrences and their
separators are expected. The dfdl:separatorSuppressionPolicy of the
sequence must be 'anyEmpty' and it is a schema definition error otherwise.
When dfdl:occursCountKind is 'stopValue', any number of occurrences and
their separators are expected followed by the stop value and its
separator. The dfdl:separatorSuppressionPolicy of the sequence has no
effect.
When dfdl:occursCountKind is 'implicit', between XSDL minOccurs and
maxOccurs (inclusive) occurrences and their separators are expected,
according to the dfdl:separatorSuppressionPolicy of the sequence.
3) 14.2.3 - updates to the last sentences of the 'When
dfdlk:occursCountKind is ...' paragraphs to match the table and notes
below.
When an element is required and is not an array then one occurrence is
always output along with its separator. The
dfdl:separatorSuppressionPolicy of the sequence has no effect (nothing is
eligible for suppression).
Otherwise the behaviour is dependent on dfdl:occursCountKind.
When dfdl:occursCountKind is 'fixed' or 'expression' the occurrences in
the augmented Infoset are always output along with their separators. The
dfdl:separatorSuppressionPolicy of the sequence has no effect (nothing is
eligible for suppression).
When dfdl:occursCountKind is 'parsed' non zero-length occurrences in the
augmented Infoset are output along with their separators. The
dfdl:separatorSuppressionPolicy of the sequence must be 'anyEmpty' and it
is a schema definition error otherwise.
When dfdl:occursCountKind is 'stopValue' the occurrences in the augmented
Infoset are output along with their separators followed by the stop value
and its separator, according to the dfdl:separatorSuppressionPolicy of the
sequence.
When dfdl:occursCountKind is 'implicit' the occurrences in the augmented
Infoset are output along with their separators, according to the
dfdl:separatorSuppressionPolicy of the sequence.
4) 16 - update to occursStopValue property description
The property is a list of logical values, so need to add: "The
dfdl:stopValue property must not be empty string."
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 18/11/2014 12:23
Subject: Re: [DFDL-WG] Fw: Fw: Action 260
Mike
Comments in-line.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson/UK/IBM at IBMGB
Cc: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date: 17/11/2014 23:11
Subject: Re: [DFDL-WG] Fw: Fw: Action 260
So I looked into what would be changed in the spec to make the adjustment
suggested in this email thread, in intent.
Rewriting the section is very undesirable at this stage for DFDL, so I was
looking for incremental changes.
What I came up with is this. In section 14.2.2 simply drop the sentences
that mention this idea of an implied SSP, because they suggest the need to
reconcile conflicting behaviors in the mixed case. I.e., drop all "The
dfdl:separatorSuppressionPolicy is not applicable and the implied
behaviour is '....'."
With the omission of these sentences, the notion of an implied separator
suppression policy that conflicts with one on the sequence goes away.
There is only the SSP property of the sequence, and it is either relevant
to the decision of whether to expect an item and its separator, or it
isn't.
The current descriptions in 14.2.2 of the behavior around suppression for
each occursCountKind seem to be correct and match the table in this email
thread, except for parsed.
I'm not sure I agree with the assertion in this email thread that ock
parsed only makes sense with "anyEmpty" behavior. We currently say in the
spec that it has anyEmpty behavior, but if we drop that sentence (per
suggestion above), then we would be loosening the behavior to allow empty
elements in some cases.
SMH: The intent of the table is that it is a schema definition error if
'parsed' and not 'anyEmpty'
Suppose:
<sequence dfdl:separatorSuppressionPolicy="trailingEmpty"
dfdl:separator="|" dfdl:separatorPosition="infix"
dfdl:terminator="%NL;">
<element name="a" type="xs:string" maxOccurs="unbounded"
dfdl:occursCountKind='parsed'/>
</sequence>
In this case, the array is declared last in its sequence. The occurrences
will all be element a, so this is positional. The number of them is
determined by OCK parsed, and if enabled, validation will check, in this
case, that at least 1 occurrence (the default for minOccurs) appears.
In this case, empty string is a legal value, and we're not strict about
trailing separators, so data like:
5|6|7||
is fine isn't it? The 'parsed' means you'll get 2 more empty string
elements for the array when parsing that will not be re-created when
unparsing, as they would be suppressed. I believe that is ok. There are
many formats that can have that sort of asymmetry.
SMH: To be clear, 'anyEmpty' already allows empty content when parsing, it
has a lax semantic
Change the above example to SSP trailingEmptyStrict, and now:
5,6,7||9
Now makes sense and you get one empty string in position 4. On unparsing
this empty string would even get written out.
I agree 'parsed' and SSP 'never' don't make sense together (as they don't
for OCK implicit - SMH: only if maxOccurs 'unbounded' - see table 18 in
14.2.2), but the other 3 SSPs seem ok to me for declared-last elements.
OCK parsed behaves (w.r.t. suppression, and ignoring defaulting and
validation) just like OCK implicit with minOccurs 0 and maxOccurs
'unbounded'. (SMH: To be clear, defaulting and validation are independent
of OCK).
If we want to preserve the current restriction that parsed behaves like
'anyEmpty', then we can stipulate that when OCK is 'parsed', any number of
non-empty occurrences and their separators are expected. (I would not be
in favor of this.)
SMH: So what you are saying is that 'parsed' is same as 'implicit'
(0..unbounded). There are circumstances when this combination gives a
schema definition error, as per Table 18 in 14.2.2 - this must therefore
be the same for 'parsed'. It turns out that you only end up adding one
extra legal behaviour for 'parsed', namely where element is declared last
and SSP is 'trailingEmpty/Strict'. That's the example you quote above, but
it's the only one. So is it buying much?
SMH: When IBM put the table below together and said 'parsed' and SSP <>
'anyEmpty' is an error, it was to prevent a subtle change in behaviour for
existing schemas with 'trailingEmpty/Strict'. Today the behaviour is
'anyEmpty', in the future it would be 'trailingEmpty/Strict'. Hence making
it an error.
As already noted in this thread, we also should add a sentence to
stopValue: "The dfdl:stopValue property must not include empty string."
The net result of these changes still isn't all that great, but it does
remove one source of confusion - the 'implied' SSP conflict.
SMH: If your 'parsed' proposal is adopted, you would need to document the
'parsed' behaviour as some combinations are schema definition errors.
Arguably you would need the equivalent of Tables 18 and 19.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Mon, Oct 13, 2014 at 10:08 AM, Steve Hanson <smh at uk.ibm.com> wrote:
IBM has discussed this issue at some length internally, and come to the
conclusion that the optional and array elements in a sequence should
follow the separator suppression policy (SSP), and not have an implied SSP
which is at odds with the sequence's SSP. In the table below, cells marked
with a cross imply a schema definition error, and cells marked ok imply
that there is a behaviour of the element for that OCK which is in keeping
with the SSP of the sequence.
SSP (1)
OCK
fixed
implicit
expression
parsed
stopValue (2)
never
ok
ok
ok
x (5)
ok (6)
trailingEmpty
trailingEmptyStrict
ok (3)
ok
ok (4)
x (5)
ok (6)
anyEmpty
ok (3)
ok
ok (4)
ok
ok
Notes:
(1) SSP property applies only to an ordered sequence. An unordered
sequence assumes 'anyEmpty' (as all optional/array elements must be
'parsed')
(2) Missing restriction - for 'stopValue' the dfdl:stopValue property must
not include empty string.
(3) maxOccurs provides count, so nothing is eligible for suppression, so
SSP has no practical effect (same as a required element)
(4) infoset provides count, so nothing is eligible for suppression, so SSP
has no practical effect (same as a required element)
(5) 'parsed' only makes sense with 'anyEmpty'
(6) Because a stop value must appear, and from (2) empty string is not
allowed, SSP has no practical effect.
The issue of maxOccurs = '0' is discussed in a separate email.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 13/10/2014 14:40 -----
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson/UK/IBM at IBMGB,
Cc: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date: 29/08/2014 21:21
Subject: Re: [DFDL-WG] Fw: Action 260
I reviewed this. It looks good to me.
The note at the bottom that we don't say what happens on a zero-trip I.e.,
a represented element, but where occursCount evaluates to 0, is a useful
clarification also.
Do we want to create an erratum for this?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Thu, Aug 28, 2014 at 10:13 AM, Steve Hanson <smh at uk.ibm.com> wrote:
Please review for Tuesday's WG call ...
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 28/08/2014 15:02 -----
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org,
Date: 06/08/2014 13:50
Subject: Fw: [DFDL-WG] Action 260
So my suggestion below, to wrap the array in a sequence, does not work; it
just moves the problem down into the new sequence.
After much deliberation, we think that the definitions of Positional
sequence and Non-positional sequence should not be viewed as driving the
behaviour of a sequence, but simply as the resultant characteristics of a
sequence that has certain properties. That leaves modellers free to mix
occursCountKinds, as in Tim's example. No need for any new SDE scenarios.
Positional sequence - Each occurrence in the sequence can be identified by
its position in the data. Typically the components of such a sequence do
not have an initiator. In some such sequences, the separators for optional
zero-length occurrences may or must be omitted when at the end of the
group. In DFDL, a sequence is considered positional if it contains only
required elements and/or optional and array elements that have
dfdl:occursCountKind 'implicit', 'fixed' or 'expression', and it has
dfdl:separatorSuppressionPolicy 'never', 'trailingEmptyStrict' or
'trailingEmpty'.
Non-positional sequence - Occurrences in the sequence cannot be identified
by their position in the data alone. Often the components of such a
sequence have an initiator. Such sequences sometimes allow the separator
to be omitted for optional zero-length occurrences anywhere in the
sequence. Speculative parsing might need to be employed by to identify
each occurrence. In DFDL, a sequence is non-positional if it contains any
optional or array elements that have dfdl:occursCountKind 'parsed' or
'stopValue', and/or it has dfdl:separatorSuppressionPolicy 'anyEmpty'.
See parallel email for action 261 that ensures 'expression' behaves
itself.
One behaviour that is missing from the spec. For a sequence with
separators, what is expected in the data stream if occursCount = 'fixed' /
'implicit' and maxOccurs = '0', or occursCountKind = 'expression' and
occursCount evaluates to 0 ? We believe that no separator should be
expected when parsing and none output when unparsing (same behaviour as
inputValueCalc).
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 06/08/2014 12:42 -----
From: Steve Hanson/UK/IBM
To: Tim Kimber/UK/IBM at IBMGB,
Cc: dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date: 30/06/2014 10:04
Subject: Re: [DFDL-WG] Action 260
You would wrap the array and it's count in a sequence. Then the
'count+array' is treated as a single entity as far as the parent sequence
is concerned.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Tim Kimber/UK/IBM at IBMGB
To: dfdl-wg at ogf.org,
Date: 26/06/2014 20:06
Subject: Re: [DFDL-WG] Action 260
Sent by: dfdl-wg-bounces at ogf.org
Before we settle one way or the other, I would like the following data
format to be taken into consideration.
chars,5,A,B,C,D,E,integers,1,2,3
chars,3,C,,,integers,2,10,11
I am assuming that the occursCountKind for the arrays is 'expression' and
the occursCount refers to the integer field that precedes the array. In
order to represent the empty strings on the second line it is essential to
specify SSP as 'trailingEmpty' or 'never'. If we disallow the combination
of ock='expression' and SSP='trailingEmpty' then how would this format be
modelled?
regards,
Tim Kimber,
Technical Lead for IBM Integration Bus Healthcare Pack
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson/UK/IBM at IBMGB,
Cc: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date: 25/06/2014 16:25
Subject: Re: [DFDL-WG] Action 260
Sent by: dfdl-wg-bounces at ogf.org
I prefer choice (a) for two reasons
* It is more restrictive and therefore more conservative (preserving
freedom to change in future if needed)
* If a user has a positional data format, you don't want them to even have
to understand the concept of speculation in order to model their data. So
choice (a) allows a simpler description that doesn't need to introduce the
notion that the parser might be speculation.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Wed, Jun 25, 2014 at 5:20 AM, Steve Hanson <smh at uk.ibm.com> wrote:
260
Positional and non-positional sequences (All)
10/6: Spec defines the above but also allows different occursCountKinds
within the same sequence which may have different (implied)
separatorSuppressionPolicy, which results in a sequence which is a mixture
of both. Should this be allowed? If so what are the rules? Can certain
combinations be disallowed?
17/6: IBM have discussed internally and will submit a proposal.
In the spec we define Positional Sequence and Non-Positional Sequence:
Positional sequence - Each occurrence in the sequence can be identified by
its position in the data. Typically the components of such a sequence do
not have an initiator. In some such sequences, the separators for optional
zero-length occurrences may or must be omitted when at the end of the
group. A positional sequence can be modelled by setting
dfdl:separatorSuppressionPolicy to 'never', 'trailingEmptyStrict' or
'trailingEmpty'.
Non-positional sequence - Occurrences in the sequence cannot be identified
by their position in the data alone. Typically the components of such a
sequence have an initiator. Such sequences allow the separator to be
omitted for optional zero-length occurrences anywhere in the sequence.
Speculative parsing is employed by the parser to identify each
occurrence. A non-positional sequence can be modelled by setting
dfdl:separatorSuppressionPolicy to 'anyEmpty'.
The problem is that the setting of dfdl:separatorSuppressionPolicy is only
examined for child elements with dfdl:occursCountKind 'implicit'. For
other dfdl:occursCountKinds, there is the concept of an 'implied'
dfdl:separatorSuppressionPolicy:
When dfdl:occursCountKind is 'fixed' then ... the implied behaviour is
'never'.
When dfdl:occursCountKind is 'expression' ... the implied behaviour is
'never'.
When dfdl:occursCountKind is 'parsed' ... the implied behaviour is
'anyEmpty'.
When dfdl:occursCountKind is 'stopValue' ...the implied behaviour is
'anyEmpty'.
So if a Positional sequence as defined above contains children with
dfdl:occursCountKind 'parsed' or 'stopValue' then surely it is no longer a
Positional sequence.
A solution to this is to prevent the appearance of certain values of
dfdl:occursCountKind within a Positional sequence. However, precisely
which values to outlaw is subject to interpretation of the phrase "Each
occurrence in the sequence can be identified by its position in the data".
Is this intended to mean:
a) an observer of the raw data can identify an occurrence of an element in
the sequence solely by counting separators
=> SDE if 'parsed', 'stopValue' or 'expression' ** appeared in a
Positional sequence;
** Although 'expression' would appear to be like 'fixed' it actually
breaks a) so must be included in the SDE list.
or
b) a parser does not have to speculate to identify an occurrence of an
element in the sequence
=> SDE only if 'parsed' appeared in a Positional sequence.
Note that it is possible to wrap a 'parsed' etc element in a local
sequence or another element to avoid an SDE. But this could still be seen
as a violation of a) if the separators of both are the same, as the
observer can not count the separators. So should the rule be applied
recursively, ie, a Positional sequence can not contain a non-Positional
sequence unless the separators are different?
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20141125/1830b4d4/attachment-0001.html>
More information about the dfdl-wg
mailing list