[DFDL-WG] DFDL regular expressions and Unicode

Steve Hanson smh at uk.ibm.com
Mon Jul 8 06:10:53 EDT 2013


Jonathan

I've copied Andy who added regexs support into IBM DFDL recently. He might 
have an idea as to the effort involved in stating conformance.

We will discuss your other two emails on next DFDL-WG call or so.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   "Cranford, Jonathan W." <jcranford at mitre.org>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   06/07/2013 00:56
Subject:        Re: [DFDL-WG] DFDL regular expressions and Unicode
Sent by:        dfdl-wg-bounces at ogf.org



Update: I just found errata 3.29, which answers this question, I think.

>From the description in the errata, and looking at the documentation for 
java 7 regular expressions, it looks like DFDL regular expressions conform 
to level 1 of Unicode Regular expressions (UTS#18).

I still think there would be value in stating such conformance in the DFDL 
spec, but I suppose that would take some legwork for someone to actually 
confirm the conformance of ICU and Java7 to level 1.

Very respectfully,

-- Jonathan Cranford


>-----Original Message-----
>From: Cranford, Jonathan W.
>Sent: Friday, July 05, 2013 1:36 PM
>To: dfdl-wg at ogf.org
>Subject: DFDL regular expressions and Unicode
>
>I've been going through the spec recently, and I have a few questions 
about DFDL
>regular expressions.
>
>Rather than put them into one long email, I'll break them up into 
separate emails.
>
>First question:  What level of conformance to Unicode Technical Standard 
#18
>UNICODE
>    REGULAR EXPRESSIONS do DFDL regular expressions claim?
>
>    For example,
>    * XML Schema regular expressions are "targeted at support of 'Level 
1'
>features"
>        (http://www.w3.org/TR/xmlschema-2/#dt-ccesN)
>    * Java 1.4 regular expressions "implement its second level of 
support"
>        (
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html)
>    * Perl 5.18 seems to implement most of Level 1
>        (
http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-
>Support-Level)
>
>    I think the conformance level should be specified in the DFDL spec so 
that it is
>clear to schema
>    designers what a regular expression would really match against. 
Details
>    like case conversion and canonical equivalence make a difference when
>    matching against a Unicode string.
>
>Thanks in advance,
>
>--
>Jonathan W. Cranford <jcranford at mitre.org>
>Senior Information Systems Engineer
>The MITRE Corporation (http://www.mitre.org)

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130708/990f1e17/attachment.html>


More information about the dfdl-wg mailing list