[DFDL-WG] ICU Parsing Behaviour

Richard Schofield richard_schofield at uk.ibm.com
Tue Sep 11 10:14:30 EDT 2012


Attached below is a brief summary of ICU's number parsing behaviour (data 
collected using ICU4C 49.1.2)

The following data will be parsed successfully regardless of the pattern 
or strict/lenient parsing mode

Digits 0-9
Decimal Separator
Exponent Separator


Grouping separators are only permitted in the data, if specified in the 
pattern


When in STRICT mode the following will cause a parse failure (using a 
pattern of  "#,##0.#")

Leading or doubled grouping separators
',123' and '1,,234" fail
Groups of incorrect length when grouping is used
'1,23' and '1234,567' fail, but '1234' passes
Grouping separators used in numbers followed by exponents
'1,234E5' fails, but '1234E5' and '1,234E' pass ('E' is not an exponent 
when not followed by a number)

'Other' characters in the pattern must be present in the data. For 
example, with a pattern of 'z'## :-
        z12 will parse successfully, 12 will only parse successfully in 
lenient mode.

Regards

Richard
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120911/b1211c08/attachment.html>


More information about the dfdl-wg mailing list