[DFDL-WG] Action 204: Establish strict versus lax behaviour for ICU calendar patterns

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri Aug 16 15:59:19 EDT 2013


Adding these as Errata 2.150 (6 E's patterns added) and 2.151 clarification
of strict/lax for calendarCheckPolicy.


On Wed, Aug 14, 2013 at 12:20 PM, Steve Hanson <smh at uk.ibm.com> wrote:

> Agreed on call to add in these descriptions, minus the footnotes.
> Errata will be raised to add EEEEEE and eeeeee.
> There are several bugs in ICU, all of which should ideally be documented
> in the release notes for a DFDL implementation. The broken EEEEE behaviour
> and the ICU4C v ICU4J differences both come under this.
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Steve Hanson/UK/IBM at IBMGB,
> Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        14/08/2013 14:23
> Subject:        Re: [DFDL-WG] Action 204: Establish strict versus lax
> behaviour for ICU calendar patterns
> ------------------------------
>
>
>
> This is helpful.
>
> Given where we are, let's just put this in as doc of what strict and lax
> mean.
>
> I'm in favor of adding the variations of EEEE... and eeee... which are
> supported by ICU. This is upward compatible, and will avoid need for a
> special check to exclude them.
>
> The broken EEEEE form is just a bug - I'd say this is just a release note
> item for products providing DFDL, unless ICU fixes it 'real soon now'.
>
>
> On Wed, Aug 14, 2013 at 8:58 AM, Steve Hanson <*smh at uk.ibm.com*<smh at uk.ibm.com>>
> wrote:
> For the subset of ICU symbols that DFDL supports, here is what ICU claim:
> *
> 1) Lenient parsing behaviour when in 'strict' mode: *
> a) case insensitive matching for text fields
> b) MMM, MMMM, MMMMM all accept either short or long form of Month
> c) E, EE, EEE, EEEE, EEEEE **, EEEEEE *** all accept either abbreviated,
> full, narrow and short forms of Day of Week
> d) accept truncated leftmost numeric field (eg, pattern "HHmmss" allows
> "123456" (12:34:56) and "23456" (2:34:56) but not "3456")
> *
> 2) Additional lenient parsing behaviour when in 'lax' mode:*
> a) values outside valid ranges are normalized (eg, "March 32 1996" is
> treated as "April 1 1996")
> b) ignoring a trailing dot after a non-numeric field
> c) leading and trailing whitespace in the data but not in the pattern is
> accepted ****
> d) whitespace in the pattern can be missing in the data
> e) partial matching on literal strings (eg, data "20130621d" allowed for
> pattern "yyyyMMdd'date' " ****
>
> ** Bug found when testing this - EEEEE 'narrow' form completely broken -
> ICU ticket raised.
>
> *** EEEEEE and eeeeee are new and support a 2 char version of 'short' form
> - eg Tu or Mo. Not currently allowed by DFDL, we should consider allowing
> it.
>
> **** Only currently in ICU4C. ICU4J will be changed to match ICU4C.
>
> Note: IBM is in discussion with ICU to provide a 'really strict' mode
> (name tbd) which has no leniency at all. We need to decide whether to
> reflect all three variants in the dfdl:calendarCheckPolicy, or whether to
> remap our 'strict' to the new 'really strict' mode when it appears. Given
> where we are I think is a DFDL 2.0 item.
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:*+44-1962-815848* <%2B44-1962-815848>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>   *https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
>
> --
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *
> www.tresys.com* <http://www.tresys.com/>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130816/50ec449d/attachment.html>


More information about the dfdl-wg mailing list