[DFDL-WG] DFDL regular expressions and Unicode
Cranford, Jonathan W.
jcranford at mitre.org
Fri Jul 5 19:55:00 EDT 2013
Update: I just found errata 3.29, which answers this question, I think.
>From the description in the errata, and looking at the documentation for java 7 regular expressions, it looks like DFDL regular expressions conform to level 1 of Unicode Regular expressions (UTS#18).
I still think there would be value in stating such conformance in the DFDL spec, but I suppose that would take some legwork for someone to actually confirm the conformance of ICU and Java7 to level 1.
Very respectfully,
-- Jonathan Cranford
>-----Original Message-----
>From: Cranford, Jonathan W.
>Sent: Friday, July 05, 2013 1:36 PM
>To: dfdl-wg at ogf.org
>Subject: DFDL regular expressions and Unicode
>
>I've been going through the spec recently, and I have a few questions about DFDL
>regular expressions.
>
>Rather than put them into one long email, I'll break them up into separate emails.
>
>First question: What level of conformance to Unicode Technical Standard #18
>UNICODE
> REGULAR EXPRESSIONS do DFDL regular expressions claim?
>
> For example,
> * XML Schema regular expressions are "targeted at support of 'Level 1'
>features"
> (http://www.w3.org/TR/xmlschema-2/#dt-ccesN)
> * Java 1.4 regular expressions "implement its second level of support"
> (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html)
> * Perl 5.18 seems to implement most of Level 1
> (http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-
>Support-Level)
>
> I think the conformance level should be specified in the DFDL spec so that it is
>clear to schema
> designers what a regular expression would really match against. Details
> like case conversion and canonical equivalence make a difference when
> matching against a Unicode string.
>
>Thanks in advance,
>
>--
>Jonathan W. Cranford <jcranford at mitre.org>
>Senior Information Systems Engineer
>The MITRE Corporation (http://www.mitre.org)
More information about the dfdl-wg
mailing list