[DFDL-WG] clarification question on terminators vs. enclosing group separators/terminators

Steve Hanson smh at uk.ibm.com
Thu Aug 17 04:02:35 EDT 2017


Mike

You are interpreting the spec correctly. I would model this with the 
quotes as escapeBlockStart/End and generateEscapeBlock="always". The 
reason why the format is parse-able is precisely because the quotes are 
being used to escape the content. 

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson <smh at uk.ibm.com>
Cc:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   16/08/2017 16:55
Subject:        Re: [DFDL-WG] clarification question on terminators vs. 
enclosing group separators/terminators



So the use case that drives the question is syslogd format. 

Part of the syntax is a whitespace separated list of pairs like so:

foo="stuff with spaces" bar="more stuff with spaces and equal = signs"

The spaces separate the pairs, the quotation marks are required, not 
optional, so they're not escapeBlockStart/End, they're initiator and 
terminator. 

There's a sequence with space separator here.
Inside that is recurring "pairs" containing name and content separated by 
"=". Zero or more pairs.
Content has an initiator and terminator which are double quotes. 

The spaces inside the string content are *not* escaped. Nor equal signs.

emptyValueDelmiterPolicy is 'both', non-nillable, so 
nilValueDelimiterPolicy is not relevant.

Seems to me a parser for this does not need escaping of the spaces or = 
that appear inside the content, but the DFDL spec can only express parsing 
these if those escapes are provided.

Am I interpreting the spec correctly in this case? That because the 
surrounding groups have space and = separators, that the content must 
escape these if they appear?  


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Wed, Aug 16, 2017 at 11:28 AM, Steve Hanson <smh at uk.ibm.com> wrote:
In general, enclosing construct's delimiters are also relevant. When 
scanning for the value of an element with a terminator, there are some 
circumstances where there might not be a terminator: 
- nil value delimiter policy says there is no terminator 
- empty value delimiter policy says there is no terminator 
- element is optional so if you find enclosing construct delimiter as 
first character the element is missing 

So you *could* design a wholly delimited format where enclosing construct 
delimiters never needed escaping but it would be a bit restrictive in 
practice. 
Formats that I have seen where enclosing construct delimiters are not 
escaped usually have fixed length fields. 

Regards
 
Steve Hanson 
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890 



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org> 
Date:        16/08/2017 15:48 
Subject:        [DFDL-WG] clarification question on terminators vs. 
enclosing group        separators/terminators 
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org> 



The DFDL Spec says:

12.3.2    dfdl:lengthKind 'delimited' 
On parsing, the length of an element with dfdl:lengthKind 'delimited' is 
determined by scanning the datastream for the delimiter. 
The data stream is scanned for any of 
·         the element's terminator (if specified) 
·         an enclosing construct's separator or terminator 
·         the end of an enclosing element designated by its known length 
·         the end of the data stream 

So if an element has a terminator, are the enclosing constructs' separator 
or terminator also relevant? Or is ONLY the element's own terminator 
relevant for scanning, and hence, only the element's own terminator must 
be escaped if it appears in the content.

For example, in a space-separated group, an enclosed element has a 
terminator ";". When parsing that element, do spaces have to be escaped if 
they appear in the content, or does only the terminator ";" have to be 
escaped?

Strictly speaking it seems enclosing delimiters shouldn't have to be 
escaped, because the data must have the ";", and spaces are only 
significant as separators after finding the ";" terminator.




Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com 
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy 
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20170817/a5371968/attachment.html>


More information about the dfdl-wg mailing list