[DFDL-WG] Clarification: priority of delimiters vs. escape chars

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon May 3 15:23:27 EDT 2021


Consider two elements in a sequence, dfdl:separator="/ // ///" with
escapeCharacter="/" and escapeEscapeCharacter="/"

I did not spot language in the spec that makes it clear what gets priority,
interpreting a character as an escape char or escape-escape char, or
interpreting it as a delimiter.

Consider data "foo///bar".

   1. I could interpret that as escapeEscape, escape, and minimum length
   separator "/"
   2. Or I could interpret that as "///" maximum length separator, with no
   escaping.
   3. Or it could be an SDE.


To me, we'd be best off if the escapeCharacter was not allowed to be (SDE)
the same as the first character of any in-scope terminating delimiter.
We're not doing anyone any favors by allowing this.

Likely a similar restriction would be needed for escapeBlockEnd, that the
value of this property could not be a prefix of any in-scope-terminating
delimiter, and escapeEscapeCharacter could not be the same as the first
character of the escapeBlockEnd.

E.g., dfdl:escapeBlockStart="/" escapeBlockEnd="/" dfdl:separator="/ // ///"

With data "/foo///bar"

Is that

   1. escapeBlockStart, foo, escapeBlockEnd, separator "//" bar ?
   2. Or escapeBlockStart, foo/, separator "/" bar ?
   3. Or SDE?

Comments?



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20210503/80538318/attachment.html>


More information about the dfdl-wg mailing list