[DFDL-WG] clarification needed: textNumberCheckPolicy lax includes lax-ness about plus signs.

Steve Hanson smh at uk.ibm.com
Thu Aug 29 06:36:17 EDT 2019


Reference is at 
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/DecimalFormat.html#setParseStrict-boolean- 
and gives an example where using a + in the data but not the pattern gives 
an error when strict. The implication would be that this is not an error 
when lax, but testing with IBM DFDL does not bear this out. IBM DFDL 
behaviour matches the DFDL spec. Looking at our code, we do some 
pre-processing before passing data & pattern to ICU, but not plus sign 
checking, so it's ICU behaviour. 

Data \ Pattern
+000 & +###
000 & ###
+123
Parsed
Failed
123
Failed
Parsed

I'm pretty sure I've hit this with EDIFACT in the past. A particular field 
had an explicit sign and needed to be modelled with a pattern that 
included the sign.

Having said that, having a field that sometimes included a + and sometimes 
didn't feels like it should be a common occurrence ... 

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     DFDL-WG <dfdl-wg at ogf.org>
Date:   27/08/2019 23:52
Subject:        [DFDL-WG] clarification needed: textNumberCheckPolicy lax 
includes lax-ness about plus signs.
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



Excerpt from daffodil users mailing list indicates that the discussion of 
how "lax" textNumberCheckPolicy="lax" is w.r.t. plus signs on numbers. 


 
> If you set textNumberCheckPolicy="lax", then 
> we do ignore leading plus signs in the data
 
The DFDL specification doesn't seem to say that a leading plus sign is 
ignored. Here's what it says:
 
If 'lax' and dfdl:textNumberRep is 'standard' then grouping separators are 
ignored, leading and trailing whitespace  is ignored, leading zeros are 
ignored and quoted characters may be omitted.
 
Nothing about ignoring plus signs in that.
 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=vwSz19bw2nSLGveutKIOPdn6CFcSlr3p5zF4LU6AXQ0&s=kwB1fhf54GAkRztDGBcRjyaiRn1VtT7EORKQWX8FqyA&e= 


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190829/d3f7468a/attachment-0001.html>


More information about the dfdl-wg mailing list