[DFDL-WG] Suggest new lengthKind 'patternMatch' where failure to match is a parse error

Steve Hanson smh at uk.ibm.com
Tue Apr 12 12:56:17 EDT 2022


Mike

I think this behaviour was to ensure that we could get back a zero-length result, which then enables optional & default processing etc, without the hassle of having to provide an explicit zero-length match component in the regex.  I've not heard any complaints about this from IBM DFDL users though.  For DFDL 2.0 agree that we can improve things but rather than a whole new lengthKind, just add a new property that says what to do when there is a no match and the regex does not include that.

Regards

Steve Hanson

IBM Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com<mailto:smh at uk.ibm.com>
tel:+44-7717-378890
Note: I work Tuesday to Friday

-----Original Message-----
From: Mike Beckerle <mbeckerle at apache.org<mailto:Mike%20Beckerle%20%3cmbeckerle at apache.org%3e>>
Reply-To: mbeckerle at apache.org<mailto:mbeckerle at apache.org>
To: DFDL-WG <dfdl-wg at ogf.org<mailto:DFDL-WG%20%3cdfdl-wg at ogf.org%3e>>
Subject: [EXTERNAL] [DFDL-WG] Suggest new lengthKind 'patternMatch' where failure to match is a parse error
Date: Thu, 07 Apr 2022 18:39:11 -0400

Apache Daffodil users have had quite a lot of trouble with understanding and proper use of dfdl:lengthKind 'pattern'.

This is due to the fact that no match does *not* cause a parse error, but provides a successful parse with length of zero. People generally find this unintuitive given that if they wanted a zero-length match they could have defined their regex to allow a zero-length match.

I have made this mistake repeatedly myself when creating DFDL schemas, and supposedly I'm an expert in DFDL.

This has been so problematic that I suggest we add an additional enum for lengthKind of 'patternMatch' or maybe 'patternMatchRequired' (I'm open to suggestions for best name here) which is the same as 'pattern', except that failure to match results in a parse error instead of zero length success.

I would argue that the existing 'pattern' behavior is badly designed, but it is too late to change it for DFDL v1.0.

Rather, for DFDL v2.0 we should add a new correct behavior and call it 'patternMatch' and then we can  deprecate the existing lengthKind 'pattern'.

Has anyone else had similar difficult experience with lengthKind 'pattern' ?

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org<http://daffodil.apache.org/>
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl>
Owl Cyber Defense | www.owlcyberdefense.com<http://www.owlcyberdefense.com/>



--

  dfdl-wg mailing list

  dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>

  https://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20220412/2feb776f/attachment.html>


More information about the dfdl-wg mailing list