[DFDL-WG] DFDL regular expressions and Unicode

Cranford, Jonathan W. jcranford at mitre.org
Mon Jul 8 14:56:35 EDT 2013


Ok, thanks Steve.

I'll try to start dialing into the weekly meetings to join in the conversation.

-Jonathan


>-----Original Message-----
>From: Steve Hanson [mailto:smh at uk.ibm.com]
>Sent: Monday, July 08, 2013 4:11 AM
>To: Cranford, Jonathan W.
>Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org; Andrew Edwards
>Subject: Re: [DFDL-WG] DFDL regular expressions and Unicode
>
>Jonathan
>
>I've copied Andy who added regexs support into IBM DFDL recently. He might
>have an idea as to the effort involved in stating conformance.
>
>We will discuss your other two emails on next DFDL-WG call or so.
>
>Regards
>
>Steve Hanson
>Architect, IBM Data Format Description Language (DFDL)
>Co-Chair, OGF DFDL Working Group <http://www.ogf.org/dfdl/>
>IBM SWG, Hursley, UK
>smh at uk.ibm.com <mailto:smh at uk.ibm.com>
>tel:+44-1962-815848
>
>
>
>From:        "Cranford, Jonathan W." <jcranford at mitre.org>
>To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
>Date:        06/07/2013 00:56
>Subject:        Re: [DFDL-WG] DFDL regular expressions and Unicode
>Sent by:        dfdl-wg-bounces at ogf.org
>
>________________________________
>
>
>
>
>Update: I just found errata 3.29, which answers this question, I think.
>
>From the description in the errata, and looking at the documentation for java 7
>regular expressions, it looks like DFDL regular expressions conform to level 1 of
>Unicode Regular expressions (UTS#18).
>
>I still think there would be value in stating such conformance in the DFDL spec,
>but I suppose that would take some legwork for someone to actually confirm the
>conformance of ICU and Java7 to level 1.
>
>Very respectfully,
>
>-- Jonathan Cranford
>
>
>>-----Original Message-----
>>From: Cranford, Jonathan W.
>>Sent: Friday, July 05, 2013 1:36 PM
>>To: dfdl-wg at ogf.org
>>Subject: DFDL regular expressions and Unicode
>>
>>I've been going through the spec recently, and I have a few questions about
>DFDL
>>regular expressions.
>>
>>Rather than put them into one long email, I'll break them up into separate
>emails.
>>
>>First question:  What level of conformance to Unicode Technical Standard #18
>>UNICODE
>>    REGULAR EXPRESSIONS do DFDL regular expressions claim?
>>
>>    For example,
>>    * XML Schema regular expressions are "targeted at support of 'Level 1'
>>features"
>>        (http://www.w3.org/TR/xmlschema-2/#dt-ccesN
><http://www.w3.org/TR/xmlschema-2/#dt-ccesN> )
>>    * Java 1.4 regular expressions "implement its second level of support"
>>
>(http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
><http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html> )
>>    * Perl 5.18 seems to implement most of Level 1
>>        (http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-
><http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression->
>>Support-Level)
>>
>>    I think the conformance level should be specified in the DFDL spec so that it is
>>clear to schema
>>    designers what a regular expression would really match against.  Details
>>    like case conversion and canonical equivalence make a difference when
>>    matching against a Unicode string.
>>
>>Thanks in advance,
>>
>>--
>>Jonathan W. Cranford <jcranford at mitre.org>
>>Senior Information Systems Engineer
>>The MITRE Corporation (http://www.mitre.org <http://www.mitre.org/> )
>
>--
> dfdl-wg mailing list
> dfdl-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
><https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
>
>Unless stated otherwise above:
>IBM United Kingdom Limited - Registered in England and Wales with number
>741598.
>Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



More information about the dfdl-wg mailing list