[DFDL-WG] Fw: date & time and latest ICU possible issues/conflicts

Steve Hanson smh at uk.ibm.com
Thu Jan 17 05:10:04 EST 2013


The ICU ticket has been answered, with reference to the following 
document: 
http://cldr.unicode.org/development/development-process/design-proposals/time-zone-offset-patterns

'x'/'X' symbols
In a not-too-distant ICU there will be new symbols 'x' and 'X' to handle Z 
as a time zone for ISO8601 date/times, effectively replacing 'ZZZZZ'. Both 
'x' and 'X' will tolerate Z or +00:00 (or variants) on parsing, and on 
formatting X will result in Z and x will result in +00:00 (or variants). 

DFDL's use of 'U' is wider than this, as we allow 'U' to appear with any 
number of 'Z's, meaning that Z is accepted with non-ISO8601 date/times. 
DFDL also adds the use of 'I' symbol on its own to mean any ISO8601 
compliant date/time, and again we allow 'U' to appear with 'I'. 

However the motivating use case for adding 'U' was IBM MRM which supports 
this today. But it does so primarily for XML use cases, in particular 
ISO8601. I am not personally aware of an actual non-XML use case. 

I suggest we drop the DFDL-specific use of the 'U' symbol in conjunction 
with 'Z' and 'I' symbols from the DFDL specification via errata, and allow 
the use of 'ZZZZZ' instead, which at least will accept Z when parsing. 
When 'x'/'X' support appears in ICU, we can take a future errata to 
support it or leave until DFDL 2.0.

IBM DFDL already supports 'U' but I am ok with deprecating it as I don't 
believe it will be being used for real.

'V' symbol
In a not-too-distant ICU there will be new symbols 'VV' and 'VVV' to 
handle time zones expressed as Time Zone Ids and localized locations, 
respectively. We can add that via errata in the future, or leave until 
DFDL 2.0. 

However, at the same time the meaning of V is changed slightly. DFDL 
supports 'V'. I have asked ICU for a clarification.

'O' symbol
In a not-too-distant ICU there will be new symbol 'O' to handle localized 
GMT format variants. We can add that via errata in the future, or leave 
until DFDL 2.0.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 17/01/2013 08:58 -----

From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, 
Cc:     dfdl-wg at ogf.org
Date:   16/01/2013 18:20
Subject:        Re: [DFDL-WG] date & time and latest ICU possible 
issues/conflicts


ICU ticket raised as the help does not give an example.

https://icu.sanjose.ibm.com/gcoctrac/ticket/469#ticket

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   16/01/2013 15:11
Subject:        [DFDL-WG] date & time and latest ICU possible 
issues/conflicts
Sent by:        dfdl-wg-bounces at ogf.org




Steve Lawrence is on the Daffodil Open Source DFDL team (on CC), and he 
has dug into date/time types.

He raised some concerns to me that I really haven't been tracking at all, 
so I wanted to put in front of the rest of the group.

The date time format syntax for the latest version of icu4j contains a 'U' 

character, which means "cyclic year name". However, the daffodil 
spec says the 'U' character, following a Z makes it so a timezone of UTC 
is represented as Z instead of +00:00.

This seems to be a conflict, and would prevent us from ever upgrading to 
the newest version of ICU (which might be a good idea).

I will point out that the latest version of ICU supports ZZZZZ (5 Z's), 
which is the ISO8601 timezone format. This doesn't add all the 
functionality that the DFDL 'U' gives. My question is, is this enough? 
Are there cases where the ZU, ZZU, etc. are necessary? I'm just 
concerned that the U is going to quite a bit more complexity, and want 
to make sure the updates to latest ICU don't address the DFDL-WG concerns.

And if we still need the 'U', maybe it should change to a different 
letter to prevent conflicts with the latest ICU4J?

I would point out that the ICU pattern language cannot deal with 
dual-purpose letters very well, i.e., ambiguities are introduced if the 
same letter both introduces a format, and if following another format 
string, modifies its behavior. E.g., does ZU mean Z modified by U, or Z 
first, and then U. So it seems pretty unfortunate if the ICU libraries 
added a conflicting use of letter U.

I believe the point of DFDL's use of the U modifier for letters I and Z 
was to be absolutely clear on the GMT timezone 'Z' issue, i.e., to 
indicate that 'Z' is to be used, and -00:00 is not to be output, nor 
accepted when parsing. The ICU specification ZZZZZ says ISO format, but 
that allows either 'Z' or -00:00 to be used for GMT timezone, and it's not 
clear what it means on output. 

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130117/431bae3c/attachment.html>


More information about the dfdl-wg mailing list