[DFDL-WG] clarification question on terminators vs. enclosing group separators/terminators

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Aug 16 11:55:11 EDT 2017


So the use case that drives the question is syslogd format.

Part of the syntax is a whitespace separated list of pairs like so:

foo="stuff with spaces" bar="more stuff with spaces and equal = signs"

The spaces separate the pairs, the quotation marks are required, not
optional, so they're not escapeBlockStart/End, they're initiator and
terminator.

There's a sequence with space separator here.
Inside that is recurring "pairs" containing name and content separated by
"=". Zero or more pairs.
Content has an initiator and terminator which are double quotes.

The spaces inside the string content are *not* escaped. Nor equal signs.

emptyValueDelmiterPolicy is 'both', non-nillable, so
nilValueDelimiterPolicy is not relevant.

Seems to me a parser for this does not need escaping of the spaces or =
that appear inside the content, but the DFDL spec can only express parsing
these if those escapes are provided.

Am I interpreting the spec correctly in this case? That because the
surrounding groups have space and = separators, that the content must
escape these if they appear?


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Wed, Aug 16, 2017 at 11:28 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> In general, enclosing construct's delimiters are also relevant. When
> scanning for the value of an element with a terminator, there are some
> circumstances where there might not be a terminator:
> - nil value delimiter policy says there is no terminator
> - empty value delimiter policy says there is no terminator
> - element is optional so if you find enclosing construct delimiter as
> first character the element is missing
>
> So you *could* design a wholly delimited format where enclosing construct
> delimiters never needed escaping but it would be a bit restrictive in
> practice.
> Formats that I have seen where enclosing construct delimiters are not
> escaped usually have fixed length fields.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848 <+44%201962%20815848>
> mob:+44-7717-378890 <+44%207717%20378890>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        16/08/2017 15:48
> Subject:        [DFDL-WG] clarification question on terminators vs.
> enclosing group        separators/terminators
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> The DFDL Spec says:
>
> 12.3.2    *dfdl:lengthKind 'delimited'*
>
> On parsing, the length of an element with dfdl:lengthKind 'delimited' is
> determined by scanning the datastream for the delimiter.
>
> The data stream is scanned for any of
>
> ·         the element's terminator (if specified)
>
> ·         an enclosing construct's separator or terminator
>
> ·         the end of an enclosing element designated by its known length
>
> ·         the end of the data stream
>
>
> So if an element has a terminator, are the enclosing constructs' separator
> or terminator also relevant? Or is ONLY the element's own terminator
> relevant for scanning, and hence, only the element's own terminator must be
> escaped if it appears in the content.
>
> For example, in a space-separated group, an enclosed element has a
> terminator ";". When parsing that element, do spaces have to be escaped if
> they appear in the content, or does only the terminator ";" have to be
> escaped?
>
> Strictly speaking it seems enclosing delimiters shouldn't have to be
> escaped, because the data must have the ";", and spaces are only
> significant as separators after finding the ";" terminator.
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20170816/00a9cbed/attachment-0001.html>


More information about the dfdl-wg mailing list