[DFDL-WG] Action 260

Wed Jun 25 11:24:40 EDT 2014

I prefer choice (a) for two reasons

* It is more restrictive and therefore more conservative (preserving
freedom to change in future if needed)
* If a user has a positional data format, you don't want them to even have
to understand the concept of speculation in order to model their data. So
choice (a) allows a simpler description that doesn't need to introduce the
notion that the parser might be speculation.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>

On Wed, Jun 25, 2014 at 5:20 AM, Steve Hanson <smh at uk.ibm.com> wrote:

>   *260*
> *Positional and non-positional sequences (All)*
> 10/6: Spec defines the above but also allows different occursCountKinds
> within the same sequence which may have different (implied)
> separatorSuppressionPolicy, which results in a sequence which is a mixture
> of both. Should this be allowed? If so what are the rules? Can certain
> combinations be disallowed?
> 17/6: IBM have discussed internally and will submit a proposal.
>
> In the spec we define Positional Sequence and Non-Positional Sequence:
>
> *Positional sequence - **Each occurrence in the sequence can be
> identified by its position in the data. Typically the components of such a
> sequence do not have an initiator. In some such sequences, the separators
> for optional zero-length occurrences may or must be omitted when at the end
> of the group. A positional sequence can be modelled by setting
> dfdl:separatorSuppressionPolicy to 'never', 'trailingEmptyStrict'  or
> 'trailingEmpty'.*
>
> *Non-positional sequence - **Occurrences in the sequence cannot be
> identified by their position in the data alone. Typically the components of
> such a sequence have an initiator. Such sequences allow the separator to be
> omitted for optional zero-length occurrences anywhere in the sequence.
> Speculative parsing is employed by the parser to identify each occurrence.
>  A non-positional sequence can be modelled by setting
> dfdl:separatorSuppressionPolicy to 'anyEmpty'. *
>
> The problem is that the setting of dfdl:separatorSuppressionPolicy is only
> examined for child elements with dfdl:occursCountKind 'implicit'.  For
> other dfdl:occursCountKinds, there is the concept of an 'implied'
> dfdl:separatorSuppressionPolicy:
>
> *When dfdl:occursCountKind is 'fixed' then ... the implied behaviour is
> 'never'.*
>
> *When dfdl:occursCountKind is 'expression' ... the implied behaviour is
> 'never'.*
>
> *When dfdl:occursCountKind is 'parsed' ... the implied behaviour is
> 'anyEmpty'. *
>
> *When dfdl:occursCountKind is 'stopValue' ...the implied behaviour is
> 'anyEmpty'. *
>
> So if a Positional sequence as defined above contains children with
> dfdl:occursCountKind 'parsed' or 'stopValue' then surely it is no longer a
> Positional sequence.
>
> A solution to this is to prevent the appearance of certain values of
> dfdl:occursCountKind within a Positional sequence. However, precisely which
> values to outlaw is subject to interpretation of the phrase "*Each
> occurrence in the sequence can be identified by its position in the data*".
> Is this intended to mean:
>
> *a) an observer of the raw data can identify an occurrence of an element
> in the sequence solely by counting separators *
>
> => SDE if 'parsed', 'stopValue' or 'expression' ** appeared in a
> Positional sequence;
>
> ** Although 'expression' would appear to be like 'fixed' it actually
> breaks a) so must be included in the SDE list.
>
> or
>
> *b) a parser does not have to speculate to identify an occurrence of an
> element in the sequence*
>
> => SDE only if 'parsed' appeared in a Positional sequence.
>
> Note that it is possible to wrap a 'parsed' etc element in a local
> sequence or another element to avoid an SDE. But this could still be seen
> as a violation of a) if the separators of both are the same, as the
> observer can not count the separators. So should the rule be applied
> recursively, ie, a Positional sequence can not contain a non-Positional
> sequence unless the separators are different?
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140625/f29a199f/attachment.html>