[DFDL-WG] optional whitespace entity

Steve Hanson smh at uk.ibm.com
Thu Jun 13 10:59:45 EDT 2013


Or, assuming you have dfdl:separator '%SP;' on the parent sequence, change 
that to '%SP;%WSP*;'

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Steve Hanson/UK/IBM
To:     "Garriss Jr., James P." <jgarriss at mitre.org>, 
Cc:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, dfdl-wg-bounces at ogf.org
Date:   13/06/2013 13:50
Subject:        Re: [DFDL-WG] optional whitespace entity


You could use dfdl:textTrimKind 'padChar', 
dfdl:textStringPadCharacter'%SP;' and dfdl:textStringJustification 
'right', and trim off the excess space.  I'm guessing that your pad char 
is a '0' right now but the '0' is harmless.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   "Garriss Jr., James P." <jgarriss at mitre.org>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   13/06/2013 13:20
Subject:        Re: [DFDL-WG] optional whitespace entity
Sent by:        dfdl-wg-bounces at ogf.org



> can you show us the scenario where you want to apply this?
 
Sure.
 
Consider the Date header.  Note that the day of the month is *always* 2 
digits (that is, it’s 04 instead of just 4):
 
Date: Fri, 04 Feb 2013 08:54:52 -0500
 
Now consider the Received header, which finishes with a date.  Sometimes 
the day of the month is 2 digits, when the day is 10 or higher:
 
Received: by mail-wi0-f178.google.com with SMTP id hj6so339193wib.11
  for <jgarriss at mitre.org>; Thu, 30 May 2013 22:28:57 -0700 (PDT)
 
Sometimes it is 2 characters but instead of a leading 0 (like the Date 
header above), there is a blank space preceding the day.  If you look 
closely in this example, you will see that there are 2 spaces between 
“Tue,” and “4 Jun”:
 
Received: from 131.28.34.56 ([131.28.34.56]) by
VFOHMLAO03.Enterprise.afmc.ds.af.mil ([131.28.34.43]) via Exchange 
Front-End
Server webmail.afmc.af.mil ([131.28.34.85]) with Microsoft Exchange Server
HTTP-DAV ; Tue,  4 Jun 2013 18:02:13 +0000
 
And sometimes it is 1 digit:
 
Received: from smtpksrv1.mitre.org (129.83.31.51) by IMCCAS03.MITRE.ORG
(129.83.29.80) with Microsoft SMTP Server id 14.2.342.3; Tue, 4 Jun 2013
09:31:47 -0400
 
The problem is how to model the Day element when it’s part of the Received 
header.  If the length is merely set to 2 characters, then the value can 
be “ 4”, which Daffodil complains that it can’t convert it into an 
integer.  So I settled on this:
 
            <!-- Day is set to delimited instead of explicit (w/ length = 
2) b/c the date in Received can be 1 character -->
            <xsd:element name="Day" dfdl:lengthKind="delimited" 
dfdl:initiator="%WSP*;">
                        <xsd:annotation>
                                    <xsd:appinfo source="
http://www.ogf.org/dfdl/dfdl-1.0/">
                                                <dfdl:assert test="{ 
dfdl:checkConstraints(.) }" message="There cannot be more than 31 days in 
a month"/>
                                    </xsd:appinfo>
                        </xsd:annotation>
                        <xsd:simpleType>
                                    <xsd:restriction base=
"xsd:unsignedInt">
                                                <xsd:maxInclusive value=
"31"/>
                                    </xsd:restriction>
                        </xsd:simpleType>
            </xsd:element>
 
(I have a different version of the Day element for the Date header.) 
 
I started this thread because I wanted to make sure that this initiator 
would work whether the extra space is present or absent.  BTW, happy 
ending:  it works as expected in Daffodil 0.10.
 
> I am wondering whether whether %WSP*; on its own should be allowed as a 
delimiter?
 
If not, then please provide some solution for the above problem.
 
From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Thursday, June 13, 2013 5:30 AM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] optional whitespace entity
 
James, please can you show us the scenario where you want to apply this? 

I ask because I think it is the only example in DFDL where you can specify 
a DFDL delimiter and for there to be nothing for that delimiter in the 
data. I suspect this might have some ramifications for things like 
delimiter scanning and dfdl:initiatedContent. It clearly solves a problem 
for James, but I am wondering whether whether %WSP*; on its own should be 
allowed as a delimiter? 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        "Garriss Jr., James P." <jgarriss at mitre.org> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:        11/06/2013 16:19 
Subject:        Re: [DFDL-WG] optional whitespace entity 
Sent by:        dfdl-wg-bounces at ogf.org 




You know, Tim, that is what I meant, but I just copied the syntax directly 
from Table 4.  Upon further review, I see that this table doesn’t give the 
complete syntax.  I wonder how many other people will just cut-and-paste 
directly from these tables.  It might be a good idea to put the complete 
syntax there. 
  
In any case, thanks for the explanation.  That’s what I hoped it would do. 

  
From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf 
Of Tim Kimber
Sent: Tuesday, June 11, 2013 11:02 AM
To: dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] optional whitespace entity 
  
I assume that you meant to write dfdl:initiator="%WSP*;" 

That will match zero or more whitespace characters. It will match and 
consume any leading white space before the element, and it will never fail 
to match. 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:        "Garriss Jr., James P." <jgarriss at mitre.org> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:        11/06/2013 15:48 
Subject:        [DFDL-WG] optional whitespace entity 
Sent by:        dfdl-wg-bounces at ogf.org 





<xsd:element name="Day" dfdl:lengthKind="delimited" dfdl:initiator="WSP*"> 

 
Will this element match if the initiator is not found (that is, even if 
there is not a space before element?  TIA--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130613/2ec8ef15/attachment-0001.html>


More information about the dfdl-wg mailing list