[DFDL-WG] optional whitespace entity

Garriss Jr., James P. jgarriss at mitre.org
Thu Jun 13 08:18:02 EDT 2013


> can you show us the scenario where you want to apply this?

Sure.

Consider the Date header.  Note that the day of the month is *always* 2 digits (that is, it’s 04 instead of just 4):

Date: Fri, 04 Feb 2013 08:54:52 -0500

Now consider the Received header, which finishes with a date.  Sometimes the day of the month is 2 digits, when the day is 10 or higher:

Received: by mail-wi0-f178.google.com with SMTP id hj6so339193wib.11
  for <jgarriss at mitre.org>; Thu, 30 May 2013 22:28:57 -0700 (PDT)

Sometimes it is 2 characters but instead of a leading 0 (like the Date header above), there is a blank space preceding the day.  If you look closely in this example, you will see that there are 2 spaces between “Tue,” and “4 Jun”:

Received: from 131.28.34.56 ([131.28.34.56]) by
VFOHMLAO03.Enterprise.afmc.ds.af.mil ([131.28.34.43]) via Exchange Front-End
Server webmail.afmc.af.mil ([131.28.34.85]) with Microsoft Exchange Server
HTTP-DAV ; Tue,  4 Jun 2013 18:02:13 +0000

And sometimes it is 1 digit:

Received: from smtpksrv1.mitre.org (129.83.31.51) by IMCCAS03.MITRE.ORG
(129.83.29.80) with Microsoft SMTP Server id 14.2.342.3; Tue, 4 Jun 2013
09:31:47 -0400

The problem is how to model the Day element when it’s part of the Received header.  If the length is merely set to 2 characters, then the value can be “ 4”, which Daffodil complains that it can’t convert it into an integer.  So I settled on this:

            <!-- Day is set to delimited instead of explicit (w/ length = 2) b/c the date in Received can be 1 character -->
            <xsd:element name="Day" dfdl:lengthKind="delimited" dfdl:initiator="%WSP*;">
                        <xsd:annotation>
                                    <xsd:appinfo source="http://www.ogf.org/dfdl/dfdl-1.0/">
                                                <dfdl:assert test="{ dfdl:checkConstraints(.) }" message="There cannot be more than 31 days in a month"/>
                                    </xsd:appinfo>
                        </xsd:annotation>
                        <xsd:simpleType>
                                    <xsd:restriction base="xsd:unsignedInt">
                                                <xsd:maxInclusive value="31"/>
                                    </xsd:restriction>
                        </xsd:simpleType>
            </xsd:element>

(I have a different version of the Day element for the Date header.)

I started this thread because I wanted to make sure that this initiator would work whether the extra space is present or absent.  BTW, happy ending:  it works as expected in Daffodil 0.10.

> I am wondering whether whether %WSP*; on its own should be allowed as a delimiter?

If not, then please provide some solution for the above problem.

From: Steve Hanson [mailto:smh at uk.ibm.com]
Sent: Thursday, June 13, 2013 5:30 AM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] optional whitespace entity

James, please can you show us the scenario where you want to apply this?

I ask because I think it is the only example in DFDL where you can specify a DFDL delimiter and for there to be nothing for that delimiter in the data. I suspect this might have some ramifications for things like delimiter scanning and dfdl:initiatedContent. It clearly solves a problem for James, but I am wondering whether whether %WSP*; on its own should be allowed as a delimiter?

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group<http://www.ogf.org/dfdl/>
IBM SWG, Hursley, UK
smh at uk.ibm.com<mailto:smh at uk.ibm.com>
tel:+44-1962-815848



From:        "Garriss Jr., James P." <jgarriss at mitre.org<mailto:jgarriss at mitre.org>>
To:        "dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>" <dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>>,
Date:        11/06/2013 16:19
Subject:        Re: [DFDL-WG] optional whitespace entity
Sent by:        dfdl-wg-bounces at ogf.org<mailto:dfdl-wg-bounces at ogf.org>
________________________________



You know, Tim, that is what I meant, but I just copied the syntax directly from Table 4.  Upon further review, I see that this table doesn’t give the complete syntax.  I wonder how many other people will just cut-and-paste directly from these tables.  It might be a good idea to put the complete syntax there.

In any case, thanks for the explanation.  That’s what I hoped it would do.

From: dfdl-wg-bounces at ogf.org<mailto:dfdl-wg-bounces at ogf.org> [mailto:dfdl-wg-bounces at ogf.org] On Behalf Of Tim Kimber
Sent: Tuesday, June 11, 2013 11:02 AM
To: dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>
Subject: Re: [DFDL-WG] optional whitespace entity

I assume that you meant to write dfdl:initiator="%WSP*;"

That will match zero or more whitespace characters. It will match and consume any leading white space before the element, and it will never fail to match.

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com<mailto:kimbert at uk.ibm.com>
Tel. 01962-816742
Internal tel. 37246742




From:        "Garriss Jr., James P." <jgarriss at mitre.org<mailto:jgarriss at mitre.org>>
To:        "dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>" <dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>>,
Date:        11/06/2013 15:48
Subject:        [DFDL-WG] optional whitespace entity
Sent by:        dfdl-wg-bounces at ogf.org<mailto:dfdl-wg-bounces at ogf.org>
________________________________




<xsd:element name="Day" dfdl:lengthKind="delimited" dfdl:initiator="WSP*">

Will this element match if the initiator is not found (that is, even if there is not a space before element?  TIA--
dfdl-wg mailing list
dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU--
 dfdl-wg mailing list
 dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>
 https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130613/83346268/attachment-0001.html>


More information about the dfdl-wg mailing list