[DFDL-WG] Backtracing behavior for optional elements

Mon Feb 11 12:34:15 EST 2013

This sounds right.

Let me run an array scenario past you. Tell me if you think I am
interpreting consistently with your rules.

What you've said here is that we distinguish positional and non-positional
separators. They are very different.

Positional separators are greedy and drive the parser decision. Once
matched, they no longer tolerate failure to parse. So, if I have an array
with occursCountKind='parsed', then finding a positional separator means I
am NOT at the end of the physical array. I will have syntax for one more
element to be parsed successfully, though I may suppress its value being
added to the infoset if it is optional and I get the appropriate empty
representation after the separator. Failure means the array is broken.
Success means I will look for yet another element (because this is ock
parsed).

The above makes sense to me. This is what 'separators' means to me for the
most part, that they are a driving part of the syntax/format.

The non-positional separators case is 100% different.

In that case, the decision that a separator was found is revisited on
failure. An ock='parsed' array/optional will be ended. The thing after it
in the sequence will be attempted next.

This makes sense, I almost wish we didn't have to call it 'separator', but
I think it is a useful behavior certainly, and the right interpretation of
the properties we have in the spec and 140 stuff today.

On Mon, Feb 11, 2013 at 9:56 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> If a processing error occurs for an optional element in a sequence, the
> speculative behaviour of the DFDL parser says that the optional element is
> assumed not to be present, and the next alternative in the sequence is
> tried. That is fine when there are no separators involved, but we need to
> clear on what happens when there are separators.
>
> 1) Positional separators (separatorSuppressionPolicy is 'never',
> 'trailingEmpty' and 'trailingEmptyStrict').
> The key point about positional separators is that they are expected in the
> data, so if an error occurs while parsing the optional element, it does not
> make sense to backtrack to the start offset the element and try to match
> the next element. Yes there's a point of uncertainty in the sense that the
> element is either there or it has empty representation, but if an error
> occurs I think it must be treated as a hard error, and not cause
> backtracking.
>
> 2) Non-positional separators (separatorSuppressionPolicy is 'anyEmpty').
> This behaves like the non-separator case and the next alternative in the
> sequence is tried from the start offset. However, because 'anyEmpty'
> behavior is lax, it is possible that the next thing in the data is a
> separator, so the parser must cater for that when the element is found to
> have empty representation. But if an error occurs establishing
> representation, I think the parser should just backtrack and try to match
> the next element.
>
> Does that sound correct?
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130211/49483168/attachment.html>