[DFDL-WG] Clarification needed: regular expressions - does '.' match newlines by default?

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Nov 14 13:18:03 EST 2012


I was in a meeting the other day where a number of people said they believe
the regex capabilities offered in XML Schema are not sufficient.

I am not exactly sure what XML Schema leaves out, but I have many examples
making use of look-ahead/look-behind features, and I suspect those may be
an issue.

...mike

On Wed, Nov 14, 2012 at 12:59 PM, Suman Kalia <kalia at ca.ibm.com> wrote:

> I came across this issue couple of weeks ago..  the regular expression
> syntax used in XML Schema is strict than what is supported in Java regular
> expression.  DFDL regular expression syntax and restrictions should match
> XML schema specification..
>
> Here is an example for which APAR has been opened and we will supplying
> fix in WMB toolkit to make regular expression comply to the XML Schema
> spec...
>
> The following line causes the XML schema compiler to fail -
>
>
> <xsd:pattern value="([a-zA-Z0-9 ]|\-|\.|_|\(|\)|\\|\/|.&|\')*"/>
>
> Here the customer has escaped  forward slash and single quote characters.
> Instead of \/ it should be / and instead of \' it should be '
>
>
> Following is accepted by XML Schema compiler..
>
> <xsd:pattern value="([a-zA-Z0-9 ]|\-|\.|_|\(|\)|\\|/|.&|')*"/>
>
>
>
>
>
> Suman Kalia
> IBM Canada Lab
> WMB Toolkit Architect and Development Lead
> Tel: 905-413-3923 T/L 313-3923
> Email: kalia at ca.ibm.com
>
> For info on Message broker
>
> http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html
>
>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Tim Kimber <KIMBERT at uk.ibm.com>,
> Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
> Date:        11/14/2012 12:46 PM
> Subject:        Re: [DFDL-WG] Clarification needed: regular expressions -
> does '.' match newlines by default?
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> I agree with Tim's opinion, but add that this is *NOT* the default
> behavior of the java regex library we're using in Daffodil currently. One
> must prefix all regex's by (?s) I believe to achieve the non-default
> line-ending behavior.
>
> On Wed, Nov 14, 2012 at 11:15 AM, Tim Kimber <*KIMBERT at uk.ibm.com*<KIMBERT at uk.ibm.com>>
> wrote:
> I would vote for this feature to be switched off by default in DFDL
> processors. It is mainly useful when dealing with lines of text, but DFDL
> formats are not always lines of text.
> So to be 100% clear, I think the '.' wildcard should match all characters,
> including line endings.
>
> regards,
>
> Tim Kimber, DFDL Team,
> Hursley, UK
> Internet:  *kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 37246742
>
>
>
>
> From:        Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>
> >
> To:        *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>,
> Date:        14/11/2012 12:53
> Subject:        [DFDL-WG] Clarification needed: regular expressions -
> does '.' match newlines by default?
> Sent by:        *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
>  ------------------------------
>
>
>
>
>
> A key behavior distinction in regular expressions is whether the '.'
> wildcard matches line endings or not.
>
> Regular expression libraries can be configured, usually by some sort of
> expression modifier, either way so that the '.' will not match a line
> ending or so that it will.
>
> Question is, how is it configured by default in DFDL regular expressions?
>
> This is part of the overall issue of tightening up regular expressions as
> part of DFDL. I.e., what exactly is the regex dialect, and how is it
> configured by default.
>
> ...mike
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  *781-330-0412* <781-330-0412>
> --
>  dfdl-wg mailing list
>  *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>  *https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  781-330-0412
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121114/07954007/attachment.html>


More information about the dfdl-wg mailing list