[DFDL-WG] Action 260

Wed Jun 25 05:20:53 EDT 2014

260
Positional and non-positional sequences (All)
10/6: Spec defines the above but also allows different occursCountKinds 
within the same sequence which may have different (implied) 
separatorSuppressionPolicy, which results in a sequence which is a mixture 
of both. Should this be allowed? If so what are the rules? Can certain 
combinations be disallowed?
17/6: IBM have discussed internally and will submit a proposal.

In the spec we define Positional Sequence and Non-Positional Sequence:
Positional sequence - Each occurrence in the sequence can be identified by 
its position in the data. Typically the components of such a sequence do 
not have an initiator. In some such sequences, the separators for optional 
zero-length occurrences may or must be omitted when at the end of the 
group. A positional sequence can be modelled by setting 
dfdl:separatorSuppressionPolicy to 'never', 'trailingEmptyStrict'  or 
'trailingEmpty'.
Non-positional sequence - Occurrences in the sequence cannot be identified 
by their position in the data alone. Typically the components of such a 
sequence have an initiator. Such sequences allow the separator to be 
omitted for optional zero-length occurrences anywhere in the sequence. 
Speculative parsing is employed by the parser to identify each occurrence. 
 A non-positional sequence can be modelled by setting 
dfdl:separatorSuppressionPolicy to 'anyEmpty'. 
The problem is that the setting of dfdl:separatorSuppressionPolicy is only 
examined for child elements with dfdl:occursCountKind 'implicit'.  For 
other dfdl:occursCountKinds, there is the concept of an 'implied' 
dfdl:separatorSuppressionPolicy:
When dfdl:occursCountKind is 'fixed' then ... the implied behaviour is 
'never'.
When dfdl:occursCountKind is 'expression' ... the implied behaviour is 
'never'.
When dfdl:occursCountKind is 'parsed' ... the implied behaviour is 
'anyEmpty'. 
When dfdl:occursCountKind is 'stopValue' ...the implied behaviour is 
'anyEmpty'. 
So if a Positional sequence as defined above contains children with 
dfdl:occursCountKind 'parsed' or 'stopValue' then surely it is no longer a 
Positional sequence.
A solution to this is to prevent the appearance of certain values of 
dfdl:occursCountKind within a Positional sequence. However, precisely 
which values to outlaw is subject to interpretation of the phrase "Each 
occurrence in the sequence can be identified by its position in the data". 
Is this intended to mean:
a) an observer of the raw data can identify an occurrence of an element in 
the sequence solely by counting separators 
=> SDE if 'parsed', 'stopValue' or 'expression' ** appeared in a 
Positional sequence; 
** Although 'expression' would appear to be like 'fixed' it actually 
breaks a) so must be included in the SDE list.
or
b) a parser does not have to speculate to identify an occurrence of an 
element in the sequence
=> SDE only if 'parsed' appeared in a Positional sequence.
Note that it is possible to wrap a 'parsed' etc element in a local 
sequence or another element to avoid an SDE. But this could still be seen 
as a violation of a) if the separators of both are the same, as the 
observer can not count the separators. So should the rule be applied 
recursively, ie, a Positional sequence can not contain a non-Positional 
sequence unless the separators are different? 
Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140625/5f38dd73/attachment.html>