[DFDL-WG] regex free-spacing mode

Steve Hanson smh at uk.ibm.com
Thu Jun 27 05:26:29 EDT 2013


Mike, I believe that is the case but I have copied Andy Edwards who is the 
person in the IBM DFDL team who added our regex support.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   26/06/2013 18:56
Subject:        Re: [DFDL-WG] regex free-spacing mode
Sent by:        dfdl-wg-bounces at ogf.org



To clarify, errata v13 has this in the table for erratum 3.29 in the list 
of non-portables:

(?imsx-imsx:X)

X, as a non-capturing group with the 
given flags. Note that the flags i,s,m,x 
are valid, but appending :X to the flag is 
not.

Java 7 only 

I interpret this as meaning that only the so-called modifier-span notation 
(the : suffix) is disallowed, but not just plain (?x), but I wanted to be 
sure that was the correct interpretation.


On Wed, Jun 26, 2013 at 1:13 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com> 
wrote:

I wrote this complicated regex today and it works in Daffodil. 

Question is this. Is the (?x) which turns on regex free-spacing mode, 
officially supported in DFDL?

You can see from below that it is VERY desirable that it works..... 

  <xs:simpleType name="frontMatterType">
      <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/">
          <dfdl:simpleType lengthKind="pattern" terminator="%FF;">

            <dfdl:property name="lengthPattern"><![CDATA[(?x) # regex free 
spacing mode
            #
            # match the front matter of the document
            #
            .{1,8192}?                # up to 8K of front matter content
            #
            # front matter ends at the first message description page
            #
            (?=                       # lookahead (followed by but not 
including...)
              \f                      # a formfeed character
              (?> \s | \x08 ){1,100}? # whitespace or backspace (x08)
              MESSAGE\ DESCRIPTION\r  # this literal text
              \s{1,100}?              # up to 100 whitespaces
              -{19}\r                 # exactly 19 hyphens and a CR
            )                         # end lookahead 
            ]]></dfdl:property>

           </dfdl:simpleType>
        </xs:appinfo>
      </xs:annotation>
      <xs:restriction base="xs:string" />
    </xs:simpleType>

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com




-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130627/7233041d/attachment.html>


More information about the dfdl-wg mailing list