[DFDL-WG] DFDL regular expressions and Unicode

Cranford, Jonathan W. jcranford at mitre.org
Fri Jul 5 15:36:14 EDT 2013


I've been going through the spec recently, and I have a few questions about DFDL regular expressions.

Rather than put them into one long email, I'll break them up into separate emails.

First question:  What level of conformance to Unicode Technical Standard #18 UNICODE
    REGULAR EXPRESSIONS do DFDL regular expressions claim?  
    
    For example, 
    * XML Schema regular expressions are "targeted at support of 'Level 1' features"
        (http://www.w3.org/TR/xmlschema-2/#dt-ccesN)
    * Java 1.4 regular expressions "implement its second level of support"
        (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html)
    * Perl 5.18 seems to implement most of Level 1 
        (http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level)
    
    I think the conformance level should be specified in the DFDL spec so that it is clear to schema
    designers what a regular expression would really match against.  Details
    like case conversion and canonical equivalence make a difference when
    matching against a Unicode string.
    
Thanks in advance,

--
Jonathan W. Cranford <jcranford at mitre.org>
Senior Information Systems Engineer
The MITRE Corporation (http://www.mitre.org)



More information about the dfdl-wg mailing list