[DFDL-WG] Clarification on escape schemes - lookahead distance past an escape character

Steve Hanson smh at uk.ibm.com
Mon Apr 14 13:04:42 EDT 2014


Mike

Description of escapeCharacter in 13.2.1 says "Specifies one character 
that escapes the subsequent character. ". It doesn't say that the next 
character has to be a delimiter or be the start of a delimiter.  So I 
would expect to see <x>foo/bar</x> in the infoset.

On output, the modeller can list whatever characters he likes to be 
escaped, using extraEscapeCharacters. These don't have to be delimiters. 
To re-parse correctly therefore requires that the parser obeys the escape 
character wherever it finds it.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   14/04/2014 17:23
Subject:        [DFDL-WG] Clarification on escape schemes - lookahead 
distance past   an escape character
Sent by:        dfdl-wg-bounces at ogf.org




Consider this infix separator situation:

<sequence dfdl:separator="/%WSP*;/">
   <element name="x" type="xs:string"/>
   <element name="y" type="xs:string" minOccurs="0"/>
</sequence>

Length kind is delimited
Suppose the escape character is "/"
Suppose the data is "foo//bar"


Should the above be 
(a) <x>foo/bar</x> or 
(b) <x>foobar</x>

The problem is this. In order to produce <x>foobar</x> you have to 
recognize that the second / isn't in fact the start of a delimiter, and 
that requires lookahead for the entire possible length of the delimiter, 
and that's unbounded because of the %WSP*;  in it. 

I believe the semantics of escape characters should not require looking at 
more than the next character after the escape character, but this will 
result in the escape character behaving as if it escapes any single 
character that follows it, not only the first character of a delimiter. 

Is the right behavior here clear?

...mikeb 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140414/c42a3150/attachment.html>


More information about the dfdl-wg mailing list