[DFDL-WG] Action 313: Plus '+' sign and lax textNumberCheckPolicy

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Nov 14 11:22:11 EST 2019


Our team has observed that upgrading to a newer ICU was done actually to
fix some other bugs, so backing out to a prior rev may be trading one set
of bugs for another. I cannot recollect exactly what issues/bugs though.

Since ICU is on github, we do have the option to actually fix the bug (by
adding some compatibility flag that selects the older/preferred behavior),
and issuing a pull request.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Wed, Nov 13, 2019 at 12:31 PM Steve Hanson <smh at uk.ibm.com> wrote:

> https://unicode-org.atlassian.net/browse/ICU-20896 issue raised.
>
> I still think we need to pin DFDL 1.0 to a specific release(s).
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
> From:        Steve Hanson/UK/IBM
> To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>, slawrence at apache.org
> Cc:        DFDL-WG <dfdl-wg at ogf.org>, Liam O'Neill/UK/IBM at IBMGB
> Date:        30/08/2019 15:48
> Subject:        Re: [DFDL-WG] Action 313: Plus '+' sign and lax
> textNumberCheckPolicy
> ------------------------------
>
>
> ICU changing behaviour in an incompatible way is not good.
>
> IBM DFDL is way behind, and is still on ICU 51.2.  We are limited in what
> we can do as we try to keep the same level as IBM Integration Bus & WTX as
> we have had C namespacing issues in the past.
>
> Looking at the links, there are other changes that have crept in when
> lenient.
>
> - The string must contain a complete prefix and suffix.
> For example, if the pattern is "{#};(#)", then "{123}" or "(123)" would
> match, but "{123", "123}", and "123" would all fail.
> (The latter strings would be accepted in lenient mode.)
> - Minus and plus signs can only appear if specified in the pattern.
> In lenient mode, a plus or minus sign can always precede a number.
>
>
> In typical ICU fashion, even this is not complete. It says nothing about
> what happens if the pattern has a sign and the data doesn't.
>
> I suggest you test all the combos with Daffodil and establish the truth.
>
> Then we need to decide what to do. If there is no way of controlling this
> (eg, parameter or env var) then the safest option is to backoff Daffodil to
> the latest ICU release that matches the DFDL 1.0 spec, and change the spec
> so that the link to ICU is specific rather than the generic link which is
> in the spec today (
> http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details)
> and which floats to the latest release. We can't have a moving target.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        DFDL-WG <dfdl-wg at ogf.org>
> Date:        29/08/2019 19:49
> Subject:        [DFDL-WG] Action 313: Plus '+' sign and lax
> textNumberCheckPolicy
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
> Looks like ICU changed behavior....
>
> From: Steve Lawrence <*slawrence at apache.org* <slawrence at apache.org>>
> Sent: Thursday, August 29, 2019 1:30 PM
> To: *users at daffodil.apache.org* <users at daffodil.apache.org>
> Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How to
> model a fixed-length integer that may be padded with space on the left?
>
> I think this is a difference in ICU version?
>
> A little grepping through ICU source, I found a change [1] to their
> number parsing logic in Dec 2017:
>
> +        if (!isStrict) {
> +            parser.addMatcher(WhitespaceMatcher.getInstance());
> +            parser.addMatcher(new PlusSignMatcher());
> +        }
>
> That looks to me like a change to make it so plus signs are always
> matched in lax/lenient mode regardless of the pattern (Daffodils current
> behavior). A couple minor changes have been made to that section, but
> nothing that allows you to turn if off if lenient is on.
>
> It's hard to tell in the git history what release that was in, but it
> looks like around version 61, which is relatively new (Daffodil is on
> version 62).
>
> Also, the latest version of DecimalFormatProperties.java (looks to be an
> internal implementation, so no online javadocs), has javadocs that
> states that plus signs are always allowed in lenient/lax mode [2].
>
> I think this is a change in ICU behavior in newer versions.
>
> - Steve
>
> [1]
>
> *https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122*
> <https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122>
> [2]
>
> *https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54*
> <https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54>
>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191114/216ba2cb/attachment-0001.html>


More information about the dfdl-wg mailing list