[DFDL-WG] DFDL ICU Challenges for Implementation

Wed Aug 14 09:08:36 EDT 2013

There are a couple of features in DFDL that ICU doesn't support, yet where
all or nearly all the related functionality is supported by ICU. Perhaps
these aspects of the spec can be revisited?

1) List of Decimal Separators

The textStandardDecimalSeparator property is a list of characters.
However, ICU only supports a single character.

I see lots of potential for error here, confusing diagnostics, etc. It is
not consistent with textStandardGrouping separator, which allows only a
single character.

Is there a use case where we know we need more than one decimal separator?

The only thing I can think of is a blend of say classic European-style
decimal numbers like "1 234 567,89" and USA style " 1,234,567.89", but ICU
won't deal with different grouping separators either.

In any case if there are multiple decimal and grouping separators we really
don't have these properties right in DFDL. We should require them to be
specified not as two separate lists, but as a list of pairs, because
grouping separators match up with specific decimal separator values in a
format.

2) Case Insensitivity

Some properties that we use to configure ICU are affected by
ignoreCase="yes", but ICU does not support case insensitivity. The
properties are:

   textStandardExponentRepCharacter
   textStandardInfinityRep
   textStandardNaNRep

I can certainly imagine a need for case insensitivity here, and even for
multiple values for these (though we allow only one for Infinity and NaN).
For the infinity and nan reps that isn't so problematic as one can easily
do a pre-check before calling ICU, but for the exponent rep, that is needed
down in the detailed number format parsing. I can see no certain algorithm
other than creating separate number format parsers for each exponent rep
character in provided case, and opposite case, and then using them one by
one until a successful parse.

Is this ok or do we consider this a mistake?

3)

We are not very consistent in these properties.

We allow multiple textStandardZeroRep values, but only a single
textStandardInfinityRep, and only a single textStandardNaNRep.

We allow multiple textStandardExponentRepCharacter, and multiple
textStandardDecimalSeparator, but only a single
textStandardGroupingSeparator.

This kind of inconsistency is always problematic for users.

Comments?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130814/7da741a5/attachment.html>