[DFDL-WG] Escaping %NL; and separators with same suffix

Steve Hanson smh at uk.ibm.com
Wed Feb 4 12:12:52 EST 2015


Hi Steve

The spec says that a DFDL escape character escapes the character that 
follows it. So in your example only the CR is escaped.  %NL; allows CRLF, 
CR, LF (plus others) so won't match CR or CRLF but will match LF as it is 
not escaped. 

The use of %NL; is not compatible with the data - you need to say 
dfdl:separator =", %CR;%LF;"

Same for your more general case.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Steve Lawrence <slawrence at tresys.com>
To:     DFDL-WG <dfdl-wg at ogf.org>
Date:   04/02/2015 16:35
Subject:        [DFDL-WG] Escaping %NL; and separators with same suffix
Sent by:        dfdl-wg-bounces at ogf.org



Assume we have a schema with separator=", %NL;" and escapeCharacter="\"
and the following data:

  abc,de\CRLFfg,hij

Where CRLF is the windows-style line ending.

How does the escape character escape the CRLF?

One interpretation is that the the escape character only escapes the
following character, which means CRLF will not match %NL;, but the LF
does. So you might end up with a infoset like this:

<seq>
  <e>abc</e>
  <e>deCR</e>
  <e>fg</e>
  <e>hij</e>
</seq>

Alternatively, one might think the escape character should escape the
entire CRLF, so the resulting infoset might look like this:

<seq>
  <e>abc</e>
  <e>deCRLFfg</e>
  <e>hij</e>
</seq>

More generally, what happens when one separator is a suffix of another.
For example:

separator="XXYY YY" escapeCharacter="\"

data: abc,de\XXYYfg,hij

Does the escape character escape the entire XXYY, and YY is not
considered as a delimiter? Does this change at all if a separator is
also a prefix of another, e.g. separator="XXYY XX YY", which is very
similar to %NL;?

- Steve
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150204/bee74aea/attachment.html>


More information about the dfdl-wg mailing list