[DFDL-WG] Action 204: Establish strict versus lax behaviour for ICU calendar patterns

Steve Hanson smh at uk.ibm.com
Wed Aug 14 08:58:34 EDT 2013


For the subset of ICU symbols that DFDL supports, here is what ICU claim:

1) Lenient parsing behaviour when in 'strict' mode: 
a) case insensitive matching for text fields
b) MMM, MMMM, MMMMM all accept either short or long form of Month
c) E, EE, EEE, EEEE, EEEEE **, EEEEEE *** all accept either abbreviated, 
full, narrow and short forms of Day of Week
d) accept truncated leftmost numeric field (eg, pattern "HHmmss" allows 
"123456" (12:34:56) and "23456" (2:34:56) but not "3456")

2) Additional lenient parsing behaviour when in 'lax' mode:
a) values outside valid ranges are normalized (eg, "March 32 1996" is 
treated as "April 1 1996")
b) ignoring a trailing dot after a non-numeric field 
c) leading and trailing whitespace in the data but not in the pattern is 
accepted ****
d) whitespace in the pattern can be missing in the data 
e) partial matching on literal strings (eg, data "20130621d" allowed for 
pattern "yyyyMMdd'date' " ****

** Bug found when testing this - EEEEE 'narrow' form completely broken - 
ICU ticket raised.
*** EEEEEE and eeeeee are new and support a 2 char version of 'short' form 
- eg Tu or Mo. Not currently allowed by DFDL, we should consider allowing 
it.
**** Only currently in ICU4C. ICU4J will be changed to match ICU4C.

Note: IBM is in discussion with ICU to provide a 'really strict' mode 
(name tbd) which has no leniency at all. We need to decide whether to 
reflect all three variants in the dfdl:calendarCheckPolicy, or whether to 
remap our 'strict' to the new 'really strict' mode when it appears. Given 
where we are I think is a DFDL 2.0 item. 
Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130814/8d984414/attachment.html>


More information about the dfdl-wg mailing list