[DFDL-WG] optional whitespace entity
Steve Hanson
smh at uk.ibm.com
Thu Jun 13 10:59:45 EDT 2013
Or, assuming you have dfdl:separator '%SP;' on the parent sequence, change
that to '%SP;%WSP*;'
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: "Garriss Jr., James P." <jgarriss at mitre.org>,
Cc: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, dfdl-wg-bounces at ogf.org
Date: 13/06/2013 13:50
Subject: Re: [DFDL-WG] optional whitespace entity
You could use dfdl:textTrimKind 'padChar',
dfdl:textStringPadCharacter'%SP;' and dfdl:textStringJustification
'right', and trim off the excess space. I'm guessing that your pad char
is a '0' right now but the '0' is harmless.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: "Garriss Jr., James P." <jgarriss at mitre.org>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
Date: 13/06/2013 13:20
Subject: Re: [DFDL-WG] optional whitespace entity
Sent by: dfdl-wg-bounces at ogf.org
> can you show us the scenario where you want to apply this?
Sure.
Consider the Date header. Note that the day of the month is *always* 2
digits (that is, it’s 04 instead of just 4):
Date: Fri, 04 Feb 2013 08:54:52 -0500
Now consider the Received header, which finishes with a date. Sometimes
the day of the month is 2 digits, when the day is 10 or higher:
Received: by mail-wi0-f178.google.com with SMTP id hj6so339193wib.11
for <jgarriss at mitre.org>; Thu, 30 May 2013 22:28:57 -0700 (PDT)
Sometimes it is 2 characters but instead of a leading 0 (like the Date
header above), there is a blank space preceding the day. If you look
closely in this example, you will see that there are 2 spaces between
“Tue,” and “4 Jun”:
Received: from 131.28.34.56 ([131.28.34.56]) by
VFOHMLAO03.Enterprise.afmc.ds.af.mil ([131.28.34.43]) via Exchange
Front-End
Server webmail.afmc.af.mil ([131.28.34.85]) with Microsoft Exchange Server
HTTP-DAV ; Tue, 4 Jun 2013 18:02:13 +0000
And sometimes it is 1 digit:
Received: from smtpksrv1.mitre.org (129.83.31.51) by IMCCAS03.MITRE.ORG
(129.83.29.80) with Microsoft SMTP Server id 14.2.342.3; Tue, 4 Jun 2013
09:31:47 -0400
The problem is how to model the Day element when it’s part of the Received
header. If the length is merely set to 2 characters, then the value can
be “ 4”, which Daffodil complains that it can’t convert it into an
integer. So I settled on this:
<!-- Day is set to delimited instead of explicit (w/ length =
2) b/c the date in Received can be 1 character -->
<xsd:element name="Day" dfdl:lengthKind="delimited"
dfdl:initiator="%WSP*;">
<xsd:annotation>
<xsd:appinfo source="
http://www.ogf.org/dfdl/dfdl-1.0/">
<dfdl:assert test="{
dfdl:checkConstraints(.) }" message="There cannot be more than 31 days in
a month"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base=
"xsd:unsignedInt">
<xsd:maxInclusive value=
"31"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
(I have a different version of the Day element for the Date header.)
I started this thread because I wanted to make sure that this initiator
would work whether the extra space is present or absent. BTW, happy
ending: it works as expected in Daffodil 0.10.
> I am wondering whether whether %WSP*; on its own should be allowed as a
delimiter?
If not, then please provide some solution for the above problem.
From: Steve Hanson [mailto:smh at uk.ibm.com]
Sent: Thursday, June 13, 2013 5:30 AM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] optional whitespace entity
James, please can you show us the scenario where you want to apply this?
I ask because I think it is the only example in DFDL where you can specify
a DFDL delimiter and for there to be nothing for that delimiter in the
data. I suspect this might have some ramifications for things like
delimiter scanning and dfdl:initiatedContent. It clearly solves a problem
for James, but I am wondering whether whether %WSP*; on its own should be
allowed as a delimiter?
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: "Garriss Jr., James P." <jgarriss at mitre.org>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
Date: 11/06/2013 16:19
Subject: Re: [DFDL-WG] optional whitespace entity
Sent by: dfdl-wg-bounces at ogf.org
You know, Tim, that is what I meant, but I just copied the syntax directly
from Table 4. Upon further review, I see that this table doesn’t give the
complete syntax. I wonder how many other people will just cut-and-paste
directly from these tables. It might be a good idea to put the complete
syntax there.
In any case, thanks for the explanation. That’s what I hoped it would do.
From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf
Of Tim Kimber
Sent: Tuesday, June 11, 2013 11:02 AM
To: dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] optional whitespace entity
I assume that you meant to write dfdl:initiator="%WSP*;"
That will match zero or more whitespace characters. It will match and
consume any leading white space before the element, and it will never fail
to match.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: "Garriss Jr., James P." <jgarriss at mitre.org>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
Date: 11/06/2013 15:48
Subject: [DFDL-WG] optional whitespace entity
Sent by: dfdl-wg-bounces at ogf.org
<xsd:element name="Day" dfdl:lengthKind="delimited" dfdl:initiator="WSP*">
Will this element match if the initiator is not found (that is, even if
there is not a space before element? TIA--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130613/2ec8ef15/attachment-0001.html>
More information about the dfdl-wg
mailing list