[DFDL-WG] Clarification on escape schemes - lookahead distance past an escape character

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Apr 14 11:52:11 EDT 2014


Consider this infix separator situation:

<sequence dfdl:separator="/%WSP*;/">
   <element name="x" type="xs:string"/>
   <element name="y" type="xs:string" minOccurs="0"/>
</sequence>

Length kind is delimited
Suppose the escape character is "/"
Suppose the data is "foo//bar"


Should the above be
(a) <x>foo/bar</x> or
(b) <x>foobar</x>

The problem is this. In order to produce <x>foobar</x> you have to
recognize that the second / isn't in fact the start of a delimiter, and
that requires lookahead for the entire possible length of the delimiter,
and that's unbounded because of the %WSP*;  in it.

I believe the semantics of escape characters should not require looking at
more than the next character after the escape character, but this will
result in the escape character behaving as if it escapes any single
character that follows it, not only the first character of a delimiter.

Is the right behavior here clear?

...mikeb

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property
Policy<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140414/c5550b73/attachment.html>


More information about the dfdl-wg mailing list