[DFDL-WG] Action 313: Plus '+' sign and lax textNumberCheckPolicy

Steve Hanson smh at uk.ibm.com
Fri Aug 30 10:56:19 EDT 2019


ICU changing behaviour in an incompatible way is not good. 

IBM DFDL is way behind, and is still on ICU 51.2.  We are limited in what 
we can do as we try to keep the same level as IBM Integration Bus & WTX as 
we have had C namespacing issues in the past.

Looking at the links, there are other changes that have crept in when 
lenient. 

- The string must contain a complete prefix and suffix. 
For example, if the pattern is "{#};(#)", then "{123}" or "(123)" would 
match, but "{123", "123}", and "123" would all fail. 
(The latter strings would be accepted in lenient mode.)

- Minus and plus signs can only appear if specified in the pattern. 
In lenient mode, a plus or minus sign can always precede a number.




In typical ICU fashion, even this is not complete. It says nothing about 
what happens if the pattern has a sign and the data doesn't.

I suggest you test all the combos with Daffodil and establish the truth.

Then we need to decide what to do. If there is no way of controlling this 
(eg, parameter or env var) then the safest option is to backoff Daffodil 
to the latest ICU release that matches the DFDL 1.0 spec, and change the 
spec so that the link to ICU is specific rather than the generic link 
which is in the spec today (
http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details) 
and which floats to the latest release. We can't have a moving target.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     DFDL-WG <dfdl-wg at ogf.org>
Date:   29/08/2019 19:49
Subject:        [DFDL-WG] Action 313: Plus '+' sign and lax 
textNumberCheckPolicy
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



Looks like ICU changed behavior....

From: Steve Lawrence <slawrence at apache.org>
Sent: Thursday, August 29, 2019 1:30 PM
To: users at daffodil.apache.org
Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How to 
model a fixed-length integer that may be padded with space on the left?

I think this is a difference in ICU version?

A little grepping through ICU source, I found a change [1] to their
number parsing logic in Dec 2017:

+        if (!isStrict) {
+            parser.addMatcher(WhitespaceMatcher.getInstance());
+            parser.addMatcher(new PlusSignMatcher());
+        }

That looks to me like a change to make it so plus signs are always
matched in lax/lenient mode regardless of the pattern (Daffodils current
behavior). A couple minor changes have been made to that section, but
nothing that allows you to turn if off if lenient is on.

It's hard to tell in the git history what release that was in, but it
looks like around version 61, which is relatively new (Daffodil is on
version 62).

Also, the latest version of DecimalFormatProperties.java (looks to be an
internal implementation, so no online javadocs), has javadocs that
states that plus signs are always allowed in lenient/lax mode [2].

I think this is a change in ICU behavior in newer versions.

- Steve

[1]
https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122

[2]
https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54


--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=3ChVO33_CdzLR4-KiNysrkvHD0nubDCPHCy5_kKGtdg&s=j9EKBKn9GDdlIMk2iOCDS8DJM93RkV5whdP8Da_-bMk&e= 


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190830/0d76758b/attachment.html>


More information about the dfdl-wg mailing list