[DFDL-WG] Clarification: priority of delimiters vs. escape chars

Steve Hanson smh at uk.ibm.com
Tue May 4 05:12:15 EDT 2021


Mike

I ran a test with IBM DFDL using the dfdl:separator, dfdl:escapeCharacter 
and dfdl:escapeEscapeCharacter in your example.  For each element in the 
sequence I received ...

CTDV1466E : DFDL properties 'separator' ('/') and 'escapeCharacter' ('/') 
cannot include the same value. 
CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter' 
('/') cannot include the same value. 

Changing the escape scheme to be escapeBlock as per your example, I get 
for each element:

CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter' 
('/') cannot include the same value. 

So we must have discussed this in the past and concluded that it's an SDE. 


I don't get an error for dfdl:escapeBlockEnd itself though, I assume 
because once inside an escape block we are no longer looking for 
delimiters. 

Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     DFDL-WG <dfdl-wg at ogf.org>
Date:   03/05/2021 20:23
Subject:        [EXTERNAL] [DFDL-WG] Clarification: priority of delimiters 
vs. escape chars
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



Consider two elements in a sequence, dfdl:separator="/ // ///" with 
escapeCharacter="/" and escapeEscapeCharacter="/"

I did not spot language in the spec that makes it clear what gets 
priority, interpreting a character as an escape char or escape-escape 
char, or interpreting it as a delimiter.

Consider data "foo///bar". 
1.      I could interpret that as escapeEscape, escape, and minimum length 
separator "/"
2.      Or I could interpret that as "///" maximum length separator, with 
no escaping.
3.      Or it could be an SDE.

To me, we'd be best off if the escapeCharacter was not allowed to be (SDE) 
the same as the first character of any in-scope terminating delimiter. 
We're not doing anyone any favors by allowing this.

Likely a similar restriction would be needed for escapeBlockEnd, that the 
value of this property could not be a prefix of any in-scope-terminating 
delimiter, and escapeEscapeCharacter could not be the same as the first 
character of the escapeBlockEnd. 

E.g., dfdl:escapeBlockStart="/" escapeBlockEnd="/" dfdl:separator="/ // 
///"

With data "/foo///bar" 

Is that 
1.      escapeBlockStart, foo, escapeBlockEnd, separator "//" bar ?
2.      Or escapeBlockStart, foo/, separator "/" bar ?
3.      Or SDE?
Comments?



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | 
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://www.ogf.org/mailman/listinfo/dfdl-wg 



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20210504/4df270b0/attachment.html>


More information about the dfdl-wg mailing list