[DFDL-WG] A gap in the behaviour for when escapeKind=escapeBlock

Andrew Edwards andy.edwards at uk.ibm.com
Mon Sep 8 12:05:36 EDT 2014


While working in the escapeBlock area, I seem to have found a gap in the 
definition for DFDL's behaviour when using escape blocks.

>From section 13.2.1, relevant extracts are:
When 'escapeBlock': On unparsing the entire data are escaped by adding 
dfdl:escapeBlockStart to the beginning and dfdl:escapeBlockEnd to the end 
of the data. The data is either always escaped or escaped when needed as 
specified by dfdl:generateEscapeBlock. If the data is escaped and contains 
the dfdl:escapeBlockEnd then first character of each appearance of the 
dfdl:escapeBlockEnd is escaped by the dfdl:escapeEscapeCharacter. 
and
On parsing the dfdl:escapeBlockStart string must be the first characters 
in the (trimmed) data in order to activate the escape scheme. The 
dfdl:escapeBlockStart string is removed from the beginning of the data. 
Until a matching dfdl:escapeBlockEnd string (that is, one not preceded by 
the dfdl:escapeEscapeCharacter) is found in the data, any in-scope 
terminating delimiter encountered in the data is not interpreted as such, 
and any dfdl:escapeEscapeCharacters are removed when they precede an 
dfdl:escapeBlockEnd string.

Now consider a a model where:
escapeBlockStart="start"
escapeBlockEnd="end"
escapeEscapeCharacter="#"

Then take a logical value of:
A hash is a #

When we serialize the value, we wrap the value with the escapeBlockStart 
and escapeBlockEnd, and we preceed any instance of the escapeBlockEnd 
within the data with an escapeEscapeCharacter.  This then gives us the 
physical value "startA hash is a #end".  If we were to parse that data, we 
see the "#end" as an escaped escapeBlockEnd and report that there is no 
escapeBlockEnd.

The gap in the behavioural definition seems to be that the specification 
makes no claim to do anything to escape an instance of an 
escapeEscapeCharacter when serializing;  There is nothing to catch the 
case of an escapeEscapeCharacter that isn't escaping an escapeBlockEnd but 
ends up doing it by circumstance.



Andy 
Andy Edwards - IBM Integration Bus - DFDL


Email:
andy.edwards at uk.ibm.com
Snail Mail: 
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE3 V17

The Feynman problem solving Algorithm
  1) Write down the problem
  2) Think real hard
  3) Write down the answer
 -- Murray Gell-mann in the NY Times

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140908/57b25e45/attachment.html>


More information about the dfdl-wg mailing list