[DFDL-WG] Fw: Fw: Action 260

Mon Oct 13 10:08:31 EDT 2014

IBM has discussed this issue at some length internally, and come to the 
conclusion that the optional and array elements in a sequence should 
follow the separator suppression policy (SSP), and not have an implied SSP 
which is at odds with the sequence's SSP. In the table below, cells marked 
with a cross imply a schema definition error, and cells marked ok imply 
that there is a behaviour of the element for that OCK which is in keeping 
with the SSP of the sequence.

SSP (1)
OCK
fixed
implicit
expression
parsed
stopValue (2)
never
ok
ok
ok
x (5)
ok (6)
trailingEmpty
trailingEmptyStrict
ok (3)
ok
ok (4)
x (5)
ok (6)
anyEmpty
ok (3)
ok
ok (4)
ok
ok

Notes:
(1) SSP property applies only to an ordered sequence. An unordered 
sequence assumes 'anyEmpty' (as all optional/array elements must be 
'parsed')
(2) Missing restriction - for 'stopValue' the dfdl:stopValue property must 
not include empty string.
(3) maxOccurs provides count, so nothing is eligible for suppression, so 
SSP has no practical effect (same as a required element)
(4) infoset provides count, so nothing is eligible for suppression, so SSP 
has no practical effect (same as a required element)
(5) 'parsed' only makes sense with 'anyEmpty' 
(6) Because a stop value must appear, and from (2) empty string is not 
allowed, SSP has no practical effect.

The issue of maxOccurs = '0' is discussed in a separate email.

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 13/10/2014 14:40 -----

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB, 
Cc:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   29/08/2014 21:21
Subject:        Re: [DFDL-WG] Fw: Action 260

I reviewed this. It looks good to me. 

The note at the bottom that we don't say what happens on a zero-trip I.e., 
a represented element, but where occursCount evaluates to 0, is a useful 
clarification also. 

Do we want to create an erratum for this?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy

On Thu, Aug 28, 2014 at 10:13 AM, Steve Hanson <smh at uk.ibm.com> wrote:
Please review for Tuesday's WG call ... 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 28/08/2014 15:02 ----- 

From:        Steve Hanson/UK/IBM 
To:        dfdl-wg at ogf.org, 
Date:        06/08/2014 13:50 
Subject:        Fw: [DFDL-WG] Action 260 

So my suggestion below, to wrap the array in a sequence, does not work; it 
just moves the problem down into the new sequence. 

After much deliberation, we think that the definitions of Positional 
sequence and Non-positional sequence should not be viewed as driving the 
behaviour of a sequence, but simply as the resultant characteristics of a 
sequence that has certain properties. That leaves modellers free to mix 
occursCountKinds, as in Tim's example. No need for any new SDE scenarios. 

Positional sequence - Each occurrence in the sequence can be identified by 
its position in the data. Typically the components of such a sequence do 
not have an initiator. In some such sequences, the separators for optional 
zero-length occurrences may or must be omitted when at the end of the 
group. In DFDL, a sequence is considered positional if it contains only 
required elements and/or optional and array elements that have 
dfdl:occursCountKind 'implicit', 'fixed' or 'expression', and it has 
dfdl:separatorSuppressionPolicy 'never', 'trailingEmptyStrict'  or 
'trailingEmpty'. 
Non-positional sequence - Occurrences in the sequence cannot be identified 
by their position in the data alone. Often the components of such a 
sequence have an initiator. Such sequences sometimes allow the separator 
to be omitted for optional zero-length occurrences anywhere in the 
sequence. Speculative parsing might need to be employed by to identify 
each occurrence. In DFDL, a sequence is non-positional if it contains any 
optional or array elements that have dfdl:occursCountKind 'parsed' or 
'stopValue', and/or it has dfdl:separatorSuppressionPolicy 'anyEmpty'. 

See parallel email for action 261 that ensures 'expression' behaves 
itself. 

One behaviour that is missing from the spec. For a sequence with 
separators, what is expected in the data stream if occursCount = 'fixed' / 
'implicit' and maxOccurs = '0', or occursCountKind = 'expression' and 
occursCount evaluates to 0 ?  We believe that no separator should be 
expected when parsing and none output when unparsing (same behaviour as 
inputValueCalc). 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 06/08/2014 12:42 ----- 

From:        Steve Hanson/UK/IBM 
To:        Tim Kimber/UK/IBM at IBMGB, 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date:        30/06/2014 10:04 
Subject:        Re: [DFDL-WG] Action 260 

You would wrap the array and it's count in a sequence. Then the 
'count+array' is treated as a single entity as far as the parent sequence 
is concerned. 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

From:        Tim Kimber/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        26/06/2014 20:06 
Subject:        Re: [DFDL-WG] Action 260 
Sent by:        dfdl-wg-bounces at ogf.org 

Before we settle one way or the other, I would like the following data 
format to be taken into consideration. 

chars,5,A,B,C,D,E,integers,1,2,3 
chars,3,C,,,integers,2,10,11 

I am assuming that the occursCountKind for the arrays is 'expression' and 
the occursCount refers to the integer field that precedes the array. In 
order to represent the empty strings on the second line it is essential to 
specify SSP as 'trailingEmpty' or 'never'. If we disallow the combination 
of ock='expression' and SSP='trailingEmpty' then how would this format be 
modelled? 

regards,

Tim Kimber, 
Technical Lead for IBM Integration Bus Healthcare Pack
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742  
Internal tel. 37246742

From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Steve Hanson/UK/IBM at IBMGB, 
Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org> 
Date:        25/06/2014 16:25 
Subject:        Re: [DFDL-WG] Action 260 
Sent by:        dfdl-wg-bounces at ogf.org 

I prefer choice (a) for two reasons

* It is more restrictive and therefore more conservative (preserving 
freedom to change in future if needed) 
* If a user has a positional data format, you don't want them to even have 
to understand the concept of speculation in order to model their data. So 
choice (a) allows a simpler description that doesn't need to introduce the 
notion that the parser might be speculation.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com 
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy 

On Wed, Jun 25, 2014 at 5:20 AM, Steve Hanson <smh at uk.ibm.com> wrote: 
260
Positional and non-positional sequences (All) 
10/6: Spec defines the above but also allows different occursCountKinds 
within the same sequence which may have different (implied) 
separatorSuppressionPolicy, which results in a sequence which is a mixture 
of both. Should this be allowed? If so what are the rules? Can certain 
combinations be disallowed? 
17/6: IBM have discussed internally and will submit a proposal.
In the spec we define Positional Sequence and Non-Positional Sequence: 
Positional sequence - Each occurrence in the sequence can be identified by 
its position in the data. Typically the components of such a sequence do 
not have an initiator. In some such sequences, the separators for optional 
zero-length occurrences may or must be omitted when at the end of the 
group. A positional sequence can be modelled by setting 
dfdl:separatorSuppressionPolicy to 'never', 'trailingEmptyStrict'  or 
'trailingEmpty'. 
Non-positional sequence - Occurrences in the sequence cannot be identified 
by their position in the data alone. Typically the components of such a 
sequence have an initiator. Such sequences allow the separator to be 
omitted for optional zero-length occurrences anywhere in the sequence. 
Speculative parsing is employed by the parser to identify each occurrence. 
 A non-positional sequence can be modelled by setting 
dfdl:separatorSuppressionPolicy to 'anyEmpty'. 
The problem is that the setting of dfdl:separatorSuppressionPolicy is only 
examined for child elements with dfdl:occursCountKind 'implicit'.  For 
other dfdl:occursCountKinds, there is the concept of an 'implied' 
dfdl:separatorSuppressionPolicy: 
When dfdl:occursCountKind is 'fixed' then ... the implied behaviour is 
'never'. 
When dfdl:occursCountKind is 'expression' ... the implied behaviour is 
'never'. 
When dfdl:occursCountKind is 'parsed' ... the implied behaviour is   
'anyEmpty'. 
When dfdl:occursCountKind is 'stopValue' ...the implied behaviour is 
'anyEmpty'. 
So if a Positional sequence as defined above contains children with 
dfdl:occursCountKind 'parsed' or 'stopValue' then surely it is no longer a 
Positional sequence. 
A solution to this is to prevent the appearance of certain values of 
dfdl:occursCountKind within a Positional sequence. However, precisely 
which values to outlaw is subject to interpretation of the phrase "Each 
occurrence in the sequence can be identified by its position in the data". 
Is this intended to mean: 
a) an observer of the raw data can identify an occurrence of an element in 
the sequence solely by counting separators 
=> SDE if 'parsed', 'stopValue' or 'expression' ** appeared in a 
Positional sequence; 
** Although 'expression' would appear to be like 'fixed' it actually 
breaks a) so must be included in the SDE list. 
or 
b) a parser does not have to speculate to identify an occurrence of an 
element in the sequence 
=> SDE only if 'parsed' appeared in a Positional sequence. 
Note that it is possible to wrap a 'parsed' etc element in a local 
sequence or another element to avoid an SDE. But this could still be seen 
as a violation of a) if the separators of both are the same, as the 
observer can not count the separators. So should the rule be applied 
recursively, ie, a Positional sequence can not contain a non-Positional 
sequence unless the separators are different? 
Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20141013/f60d3726/attachment-0001.html>