[DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"

Mike Beckerle mbeckerle at apache.org
Tue Dec 19 13:56:28 PST 2023


It has come up often now that DFDL cannot be strict enough about text
number formats because our ICU-based textNumberPattern isn't strict enough
or expressive enough of subtle syntax variations.

I suggest this could be fixed by just allowing the XSD pattern facet to be
used on numeric types when they are known textual and standard (not zoned).

For example dfdl:textNumberPattern="00.####" will allow the number "99." to
be accepted. There's currently no way to say "when it's an integer, there
cannot be a decimal point".

People are resistant to the notion that this requires a complex type with a
bunch of different elements with different textNumberFormats so that you
have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want
there to be different paths to this value in the infoset just because of
this format issue about the decimal point. It's a painful loss of
polymorphism in these path expressions. Instead of a simple path expression
to obtain such a value you end up with

if (fn:exists(path/int)) then path/int else path/dec

Note that DFDL's expression language has no let statement, so in the above
if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which
commonly have much longer path steps than my single-letters), then that
path is going to be repeated 3 times in the expression. This is pretty
unpleasant.

Rather than come up with a bunch of ICU mods to tighten up all the places
it is lax, and to add features for suppressed decimal points, etc. we could
just allow the pattern facet on textual numbers.

Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve
the goal of enforcing the precise pattern desired if you validate after
parsing and before unparsing. It would not prevent conversion of the text
to the corresponding numeric type, but it would allow an additional tighter
check on what the text was.

Regular XML Schema allows the pattern facet on all the numeric types, so we
would be eliminating what is currently a DFDL restriction, on condition of
only when the numeric types have standard text representation.

Thoughts?

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 3496 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20231219/c26bf182/attachment.txt>


More information about the dfdl-wg mailing list