[DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"

Brutzman, Donald (Don) (CIV) brutzman at nps.edu
Tue Dec 19 20:18:11 PST 2023


We've handled a really wide range of floats and integers in X3D graphics
models, and have found that xsd:schema types are very useful.  Unusual edge
cases (for advanced error detection) can be handled with patterns (in the
case of X3D, we have regex).

 

Only limitation with this approach is that you typically have to pick one or
the other, since regex within XSD Schema only applies to xs:string types.
Sometimes using xs:schema as primary with separate regex evaluation is
useful in a tool.  You may have more flexibility about hybrid approaches in
DFDL.

 

*	X3D Regular Expressions (regexes)
*	X3D Regular Expressions (regexes) are used to validate the
correctness of string and numeric array values in an X3D scene.
*	https://www.web3d.org/specifications/X3dRegularExpressions.html

 

Opinion: the worst errors are the ones that remain undetected.

 

Season's Greetings!  8)

 

all the best, Don

-- 

Don Brutzman  Naval Postgraduate School, Code USW/Br        brutzman at nps.edu

Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA    +1.831.656.2149

X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman

 

From: dfdl-wg <dfdl-wg-bounces at lists.ogf.org> On Behalf Of Mike Beckerle
Sent: Tuesday, December 19, 2023 1:56 PM
To: DFDL-WG <dfdl-wg at ogf.org>
Subject: [DFDL-WG] Future feature? allow pattern facet on numbers when
textNumberRep="standard" and representation="text"

 

It has come up often now that DFDL cannot be strict enough about text number
formats because our ICU-based textNumberPattern isn't strict enough or
expressive enough of subtle syntax variations. 

 

I suggest this could be fixed by just allowing the XSD pattern facet to be
used on numeric types when they are known textual and standard (not zoned). 

 

For example dfdl:textNumberPattern="00.####" will allow the number "99." to
be accepted. There's currently no way to say "when it's an integer, there
cannot be a decimal point". 

 

People are resistant to the notion that this requires a complex type with a
bunch of different elements with different textNumberFormats so that you
have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want
there to be different paths to this value in the infoset just because of
this format issue about the decimal point. It's a painful loss of
polymorphism in these path expressions. Instead of a simple path expression
to obtain such a value you end up with 

 

if (fn:exists(path/int)) then path/int else path/dec

 

Note that DFDL's expression language has no let statement, so in the above
if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which
commonly have much longer path steps than my single-letters), then that path
is going to be repeated 3 times in the expression. This is pretty
unpleasant. 

 

Rather than come up with a bunch of ICU mods to tighten up all the places it
is lax, and to add features for suppressed decimal points, etc. we could
just allow the pattern facet on textual numbers. 

 

Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the
goal of enforcing the precise pattern desired if you validate after parsing
and before unparsing. It would not prevent conversion of the text to the
corresponding numeric type, but it would allow an additional tighter check
on what the text was. 

 

Regular XML Schema allows the pattern facet on all the numeric types, so we
would be eliminating what is currently a DFDL restriction, on condition of
only when the numeric types have standard text representation. 

 

Thoughts? 




Mike Beckerle 

Apache Daffodil PMC |
<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdaffodil.a
pache.org%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00d
d6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnkn
own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC
I6Mn0%3D%7C3000%7C%7C%7C&sdata=TldDgQhMRxWID8ZvQZwzc%2B6dD4%2BxkfuakuzBXhuEu
KY%3D&reserved=0> daffodil.apache.org

OGF DFDL Workgroup Co-Chair |
<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.or
g%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.e
du%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0
%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj
oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HAbHua95VB4dB
pieyOBe8EyRAm8UpmCrHe0xtmbYAj0%3D&reserved=0>
www.ogf.org/ogf/doku.php/standards/dfdl/dfdl

Owl Cyber Defense | www.owlcyberdefense.com
<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyb
erdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08d
c00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7C
Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FCzzLQQAf%2FYro8MSq7t1gu8zNO0oG5dX0Oq%2BD
AEkQM8%3D&reserved=0> 

 

 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 11863 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20231220/f62c95c9/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5464 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20231220/f62c95c9/attachment-0001.p7s>


More information about the dfdl-wg mailing list