[DFDL-WG] Fw: DFDL ICU Challenges for Implementation
Steve Hanson
smh at uk.ibm.com
Wed Aug 21 07:29:14 EDT 2013
Regarding Mike's second point:
Here's a link to an ICU ticket on this subject:
http://bugs.icu-project.org/trac/ticket/9659 - currently targetted for ICU
52.1
Here's some more details from IBM's calls with ICU on this subject:
- Exponent character / ignoreCase : Exponent char is not case sensitive.
Is this intentional?
* Priority : Medium
ICU see two options for this:
Option 1: Provide an API call to set a flag on the DecimalFormat
object.
Option 2: Make it a global policy settable via a config switch. This
would allow other 'site policies' to be made settable using the same
mechanism.
There would be one set of policy flags, including this flag, per
address space.
There are differences in date/time processing between C and
Java that could be dealt with using this mechanism.
DFDL needs some of these flags to be configurable at
runtime.
2012/10/19 Hit an issue where case handling was inconsistent. Fix
needs care to avoid changing default behaviour and thus breaking existing
users of the API.
Currency and prefix/suffix may need separate switch so
global switch for the DecimalFormat not appropriate.
Could provide a patch to ICU for setting exponent char for
now.
2013/1/17 - ICU external ticket #9659
ICU had an issue with the new API being specific to just case
sensitivity of exponent (and not other regions).
DFDL clarified the requirement is for an API to change global case
sensitivity (not just exponent).
This is targetted at ICU51
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 14/08/2013 16:55 -----
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
Date: 14/08/2013 14:14
Subject: [DFDL-WG] DFDL ICU Challenges for Implementation
Sent by: dfdl-wg-bounces at ogf.org
There are a couple of features in DFDL that ICU doesn't support, yet where
all or nearly all the related functionality is supported by ICU. Perhaps
these aspects of the spec can be revisited?
1) List of Decimal Separators
The textStandardDecimalSeparator property is a list of characters.
However, ICU only supports a single character.
I see lots of potential for error here, confusing diagnostics, etc. It is
not consistent with textStandardGrouping separator, which allows only a
single character.
Is there a use case where we know we need more than one decimal separator?
The only thing I can think of is a blend of say classic European-style
decimal numbers like "1 234 567,89" and USA style " 1,234,567.89", but ICU
won't deal with different grouping separators either.
In any case if there are multiple decimal and grouping separators we
really don't have these properties right in DFDL. We should require them
to be specified not as two separate lists, but as a list of pairs, because
grouping separators match up with specific decimal separator values in a
format.
2) Case Insensitivity
Some properties that we use to configure ICU are affected by
ignoreCase="yes", but ICU does not support case insensitivity. The
properties are:
textStandardExponentRepCharacter
textStandardInfinityRep
textStandardNaNRep
I can certainly imagine a need for case insensitivity here, and even for
multiple values for these (though we allow only one for Infinity and NaN).
For the infinity and nan reps that isn't so problematic as one can easily
do a pre-check before calling ICU, but for the exponent rep, that is
needed down in the detailed number format parsing. I can see no certain
algorithm other than creating separate number format parsers for each
exponent rep character in provided case, and opposite case, and then using
them one by one until a successful parse.
Is this ok or do we consider this a mistake?
3)
We are not very consistent in these properties.
We allow multiple textStandardZeroRep values, but only a single
textStandardInfinityRep, and only a single textStandardNaNRep.
We allow multiple textStandardExponentRepCharacter, and multiple
textStandardDecimalSeparator, but only a single
textStandardGroupingSeparator.
This kind of inconsistency is always problematic for users.
Comments?
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130821/c2face24/attachment.html>
More information about the dfdl-wg
mailing list