[DFDL-WG] data where inactive escape character is to be retained

Steve Hanson smh at uk.ibm.com
Wed Jun 4 03:55:17 EDT 2014


WG call 3rd June: The generalised use case is perhaps speculative, so it 
was agreed not to change the DFDL spec to handle this unless a concrete 
use case emerges.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   "Cranford, Jonathan W." <jcranford at mitre.org>
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, "dfdl-wg at ogf.org" 
<dfdl-wg at ogf.org>, 
Date:   20/05/2014 18:11
Subject:        Re: [DFDL-WG] data where inactive escape character is to 
be retained
Sent by:        dfdl-wg-bounces at ogf.org



All,
 
I’ll chime in with an observation and few more details on how Roger 
Costello got around a similar problem.
 
Observation 
An escape block allows the escapeBlockEnd to be escaped with the 
escapeEscapeCharacter, while allowing the escapeEscapeCharacter itself to 
appear in the data without any special semantics as long as it is NOT 
followed by escapeBlockEnd. (From section 13.2.1 of the spec: “On parsing 
the dfdl:escapeBlockStart is removed from the beginning of the data and 
dfdl:escapeBlockEnd is removed from end of the data and any 
dfdl:escapeEscapeCharacters are removed when they precede a 
dfdl:escapeBlockEnd.”)
 
This is really similar to the problem that Mike posed to the group below, 
where an escape character is sometimes an escape character but sometimes 
isn’t. 
 
While an escape block might not be suitable in all circumstances, the 
original problem that sparked Mike’s post was amenable to using an escape 
block, and that is how Roger Costello got around the problem.
 
Some more details
Roger Costello was using a quotation mark (“) as the initiator and 
terminator for quoted values in a data format.  In this format, quotation 
marks can be escaped with a backslash (\); however, within a quoted 
string, the data could have a backslash as a normal data character (e.g. 
\n, representing two characters, not a single newline character).
 
Roger posed his challenge to the Daffodil team, and then Mike created the 
example below to demonstrate the problem to the DFDL WG.  In contrast to 
Mike’s example, Roger was having the problem with initiators and 
terminators, not a separator.  At the time, we thought that that an escape 
block couldn’t be applied to the data format in question, so Mike may have 
altered the problem in order to prevent an escape block from clouding the 
issue as posed to the WG.
 
It turns out, after more analysis, that an escape block could be used, and 
that solved the problem:
escapeBlockStart=”"” escapeBlockEnd=”"” 
escapeEscapeCharacter=”\”
 
Closing Observations
In general, DFDL supports two different escape schemes with different 
behavior for the escape character.
* When escapeKind=”escapeCharacter”, the escape character is always an 
escape character.
* When escapeKind=”escapeBlock”, the escape character 
(escapeEscapeCharacter) is only an escape character in front of 
escapeBlockEnd.
 
In this case, we were able to use an escape block to model the data 
format.  While there may be a data format that has a character that is 
sometimes an escape character and sometimes isn’t, without a real world 
example, I echo Mike’s hesitance to add this feature to DFDL.
 
HTH,
 
Jonathan Cranford 
 
 
From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf 
Of Mike Beckerle
Sent: Friday, May 02, 2014 2:47 PM
To: dfdl-wg at ogf.org
Subject: [DFDL-WG] data where inactive escape character is to be retained
 
 
We have data that has ; as separator and has  fields that look like this:

abcd \; efgh
 
That's a single field.
The backslash escapes the ; so that the data is abcd ; efgh.
This same data set also has

abcd \n efgh
Here the backslash precedes an ordinary non-delimiter. The data is 
supposed to be abcd \n efgh. That is, this data set requires the backslash 
to be retained in the data when it is not preceding the start of a 
delimiter. 
Am I missing something or is it impossible to model this?
 
It would seem there needs to be a flag to indicate whether the escape 
characters that don't actually escape a delimiter are to be retained or 
not. 
 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
 --
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140604/e4901dbc/attachment-0001.html>


More information about the dfdl-wg mailing list