[DFDL-WG] Clarification: priority of delimiters vs. escape chars

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue May 4 12:11:00 EDT 2021


Ok, sanity restored :-)

So I would concur that this should be an SDE.

We need to specify this in the DFDL spec. It's an omission.

I suppose this is our first real erratum since the v1.0 spec was finalized.

I don't think there is a lot of urgency to this, because well, no real
format does anything this insane. It came up in corner case testing of
Daffodil.




Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Tue, May 4, 2021 at 5:12 AM Steve Hanson <smh at uk.ibm.com> wrote:

> Mike
>
> I ran a test with IBM DFDL using the dfdl:separator, dfdl:escapeCharacter
> and dfdl:escapeEscapeCharacter in your example.  For each element in the
> sequence I received ...
>
> CTDV1466E : DFDL properties 'separator' ('/') and 'escapeCharacter' ('/')
> cannot include the same value.
> CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter'
> ('/') cannot include the same value.
>
> Changing the escape scheme to be escapeBlock as per your example, I get
> for each element:
>
> CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter'
> ('/') cannot include the same value.
>
> So we must have discussed this in the past and concluded that it's an SDE.
>
> I don't get an error for dfdl:escapeBlockEnd itself though, I assume
> because once inside an escape block we are no longer looking for
> delimiters.
>
> Regards
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        DFDL-WG <dfdl-wg at ogf.org>
> Date:        03/05/2021 20:23
> Subject:        [EXTERNAL] [DFDL-WG] Clarification: priority of
> delimiters vs. escape chars
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> Consider two elements in a sequence, dfdl:separator="/ // ///" with
> escapeCharacter="/" and escapeEscapeCharacter="/"
>
> I did not spot language in the spec that makes it clear what gets
> priority, interpreting a character as an escape char or escape-escape char,
> or interpreting it as a delimiter.
>
> Consider data "foo///bar".
> 1.        I could interpret that as escapeEscape, escape, and minimum
> length separator "/"
> 2.        Or I could interpret that as "///" maximum length separator,
> with no escaping.
> 3.        Or it could be an SDE.
>
> To me, we'd be best off if the escapeCharacter was not allowed to be (SDE)
> the same as the first character of any in-scope terminating delimiter.
> We're not doing anyone any favors by allowing this.
>
> Likely a similar restriction would be needed for escapeBlockEnd, that the
> value of this property could not be a prefix of any in-scope-terminating
> delimiter, and escapeEscapeCharacter could not be the same as the first
> character of the escapeBlockEnd.
>
> E.g., dfdl:escapeBlockStart="/" escapeBlockEnd="/" dfdl:separator="/ //
> ///"
>
> With data "/foo///bar"
>
> Is that
> 1.        escapeBlockStart, foo, escapeBlockEnd, separator "//" bar ?
> 2.        Or escapeBlockStart, foo/, separator "/" bar ?
> 3.        Or SDE?
> Comments?
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
> *www.owlcyberdefense.com* <http://www.owlcyberdefense.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20210504/49c827b9/attachment.html>


More information about the dfdl-wg mailing list