[DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"

Steve Hanson smhdfdl at gmail.com
Wed Dec 20 03:57:12 PST 2023


Unless I am misunderstanding (quite possible!) I don't think the proposal
will work, because by the time validation is applied, the DFDL parser will
be using the logical value from the infoset, and not the original lexical
representation. That's why spec section 5.3.4 Pattern has this ...

"Note: in XSD, pattern is about the lexical representation of the data, and
since all is text there, everything has a lexical representation. In DFDL
only strings are guaranteed to have a lexical and logical value that is
identical."

I'd prefer exploring whether the dfdl:textNumberPattern property could be a
list. So for your example, the pattern would be "00 00.0###". On parsing
the patterns are tried in order. On unparsing the same, and it's only an
error if none work. My example uses space as list item separator, I think
that works as I don't think a space character is allowed as part of a
number pattern.  Is it possible to prototype this in Apache Daffodil to see
whether ICU fails when we think it should do?

Regards
Steve



On Wed, Dec 20, 2023 at 4:18 AM Brutzman, Donald (Don) (CIV) <
brutzman at nps.edu> wrote:

> We’ve handled a really wide range of floats and integers in X3D graphics
> models, and have found that xsd:schema types are very useful.  Unusual edge
> cases (for advanced error detection) can be handled with patterns (in the
> case of X3D, we have regex).
>
>
>
> Only limitation with this approach is that you typically have to pick one
> or the other, since regex within XSD Schema only applies to xs:string
> types.  Sometimes using xs:schema as primary with separate regex evaluation
> is useful in a tool.  You may have more flexibility about hybrid approaches
> in DFDL.
>
>
>
>    - X3D Regular Expressions (regexes)
>    - X3D Regular Expressions (regexes) are used to validate the
>    correctness of string and numeric array values in an X3D scene.
>    - https://www.web3d.org/specifications/X3dRegularExpressions.html
>
>
>
> Opinion: the worst errors are the ones that remain undetected.
>
>
>
> Season’s Greetings!  8)
>
>
>
> all the best, Don
>
> --
>
> Don Brutzman  Naval Postgraduate School, Code USW/Br
> brutzman at nps.edu
>
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA
> +1.831.656.2149
>
> X3D graphics, virtual worlds, navy robotics
> https://faculty.nps.edu/brutzman
>
>
>
> *From:* dfdl-wg <dfdl-wg-bounces at lists.ogf.org> *On Behalf Of *Mike
> Beckerle
> *Sent:* Tuesday, December 19, 2023 1:56 PM
> *To:* DFDL-WG <dfdl-wg at ogf.org>
> *Subject:* [DFDL-WG] Future feature? allow pattern facet on numbers when
> textNumberRep="standard" and representation="text"
>
>
>
> It has come up often now that DFDL cannot be strict enough about text
> number formats because our ICU-based textNumberPattern isn't strict enough
> or expressive enough of subtle syntax variations.
>
>
>
> I suggest this could be fixed by just allowing the XSD pattern facet to be
> used on numeric types when they are known textual and standard (not zoned).
>
>
>
> For example dfdl:textNumberPattern="00.####" will allow the number "99."
> to be accepted. There's currently no way to say "when it's an integer,
> there cannot be a decimal point".
>
>
>
> People are resistant to the notion that this requires a complex type with
> a bunch of different elements with different textNumberFormats so that you
> have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want
> there to be different paths to this value in the infoset just because of
> this format issue about the decimal point. It's a painful loss of
> polymorphism in these path expressions. Instead of a simple path expression
> to obtain such a value you end up with
>
>
>
> if (fn:exists(path/int)) then path/int else path/dec
>
>
>
> Note that DFDL's expression language has no let statement, so in the above
> if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which
> commonly have much longer path steps than my single-letters), then that
> path is going to be repeated 3 times in the expression. This is pretty
> unpleasant.
>
>
>
> Rather than come up with a bunch of ICU mods to tighten up all the places
> it is lax, and to add features for suppressed decimal points, etc. we could
> just allow the pattern facet on textual numbers.
>
>
>
> Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve
> the goal of enforcing the precise pattern desired if you validate after
> parsing and before unparsing. It would not prevent conversion of the text
> to the corresponding numeric type, but it would allow an additional tighter
> check on what the text was.
>
>
>
> Regular XML Schema allows the pattern facet on all the numeric types, so
> we would be eliminating what is currently a DFDL restriction, on condition
> of only when the numeric types have standard text representation.
>
>
>
> Thoughts?
>
>
> Mike Beckerle
>
> Apache Daffodil PMC | daffodil.apache.org
> <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdaffodil.apache.org%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TldDgQhMRxWID8ZvQZwzc%2B6dD4%2BxkfuakuzBXhuEuKY%3D&reserved=0>
>
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.org%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HAbHua95VB4dBpieyOBe8EyRAm8UpmCrHe0xtmbYAj0%3D&reserved=0>
>
> Owl Cyber Defense | www.owlcyberdefense.com
> <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyberdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FCzzLQQAf%2FYro8MSq7t1gu8zNO0oG5dX0Oq%2BDAEkQM8%3D&reserved=0>
>
>
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at lists.ogf.org
>   https://lists.ogf.org/mailman/listinfo/dfdl-wg
>


-- 
Regards
Steve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 11530 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20231220/851fa45a/attachment.txt>


More information about the dfdl-wg mailing list