[DFDL-WG] Suggest new lengthKind 'patternMatch' where failure to match is a parse error

Mike Beckerle mbeckerle at apache.org
Thu Apr 7 18:39:11 EDT 2022


Apache Daffodil users have had quite a lot of trouble with understanding
and proper use of dfdl:lengthKind 'pattern'.

This is due to the fact that no match does *not* cause a parse error, but
provides a successful parse with length of zero. People generally find this
unintuitive given that if they wanted a zero-length match they could have
defined their regex to allow a zero-length match.

I have made this mistake repeatedly myself when creating DFDL schemas, and
supposedly I'm an expert in DFDL.

This has been so problematic that I suggest we add an additional enum for
lengthKind of 'patternMatch' or maybe 'patternMatchRequired' (I'm open to
suggestions for best name here) which is the same as 'pattern', except that
failure to match results in a parse error instead of zero length success.

I would argue that the existing 'pattern' behavior is badly designed, but
it is too late to change it for DFDL v1.0.

Rather, for DFDL v2.0 we should add a new correct behavior and call it
'patternMatch' and then we can  deprecate the existing lengthKind
'pattern'.

Has anyone else had similar difficult experience with lengthKind 'pattern' ?

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20220407/67e1986f/attachment.html>


More information about the dfdl-wg mailing list