[DFDL-WG] Escaping %NL; and separators with same suffix

Steve Lawrence slawrence at tresys.com
Wed Feb 4 12:13:46 EST 2015


Ok, thank you for the clarification.

On 02/04/2015 12:12 PM, Steve Hanson wrote:
> Hi Steve
> 
> The spec says that a DFDL escape character escapes the character that 
> follows it. So in your example only the CR is escaped.  %NL; allows CRLF, 
> CR, LF (plus others) so won't match CR or CRLF but will match LF as it is 
> not escaped. 
> 
> The use of %NL; is not compatible with the data - you need to say 
> dfdl:separator =", %CR;%LF;"
> 
> Same for your more general case.
> 
> Regards
>  
> Steve Hanson
> Architect, IBM DFDL
> Co-Chair, OGF DFDL Working Group
> IBM SWG, Hursley, UK
> smh at uk.ibm.com
> tel:+44-1962-815848
> 
> 
> 
> From:   Steve Lawrence <slawrence at tresys.com>
> To:     DFDL-WG <dfdl-wg at ogf.org>
> Date:   04/02/2015 16:35
> Subject:        [DFDL-WG] Escaping %NL; and separators with same suffix
> Sent by:        dfdl-wg-bounces at ogf.org
> 
> 
> 
> Assume we have a schema with separator=", %NL;" and escapeCharacter="\"
> and the following data:
> 
>   abc,de\CRLFfg,hij
> 
> Where CRLF is the windows-style line ending.
> 
> How does the escape character escape the CRLF?
> 
> One interpretation is that the the escape character only escapes the
> following character, which means CRLF will not match %NL;, but the LF
> does. So you might end up with a infoset like this:
> 
> <seq>
>   <e>abc</e>
>   <e>deCR</e>
>   <e>fg</e>
>   <e>hij</e>
> </seq>
> 
> Alternatively, one might think the escape character should escape the
> entire CRLF, so the resulting infoset might look like this:
> 
> <seq>
>   <e>abc</e>
>   <e>deCRLFfg</e>
>   <e>hij</e>
> </seq>
> 
> More generally, what happens when one separator is a suffix of another.
> For example:
> 
> separator="XXYY YY" escapeCharacter="\"
> 
> data: abc,de\XXYYfg,hij
> 
> Does the escape character escape the entire XXYY, and YY is not
> considered as a delimiter? Does this change at all if a separator is
> also a prefix of another, e.g. separator="XXYY XX YY", which is very
> similar to %NL;?
> 
> - Steve
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
> 
> 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 



More information about the dfdl-wg mailing list