[DFDL-WG] DFDL regular expressions and Unicode
Cranford, Jonathan W.
jcranford at mitre.org
Fri Jul 5 15:36:14 EDT 2013
I've been going through the spec recently, and I have a few questions about DFDL regular expressions.
Rather than put them into one long email, I'll break them up into separate emails.
First question: What level of conformance to Unicode Technical Standard #18 UNICODE
REGULAR EXPRESSIONS do DFDL regular expressions claim?
For example,
* XML Schema regular expressions are "targeted at support of 'Level 1' features"
(http://www.w3.org/TR/xmlschema-2/#dt-ccesN)
* Java 1.4 regular expressions "implement its second level of support"
(http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html)
* Perl 5.18 seems to implement most of Level 1
(http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level)
I think the conformance level should be specified in the DFDL spec so that it is clear to schema
designers what a regular expression would really match against. Details
like case conversion and canonical equivalence make a difference when
matching against a Unicode string.
Thanks in advance,
--
Jonathan W. Cranford <jcranford at mitre.org>
Senior Information Systems Engineer
The MITRE Corporation (http://www.mitre.org)
More information about the dfdl-wg
mailing list