[DFDL-WG] Ignore extraneous CRLF w/ space? - correction

Steve Hanson smh at uk.ibm.com
Wed Jun 5 13:06:33 EDT 2013


Correction below.

Received: from smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1]) 
 by localhost (Postfix) via Exchange Front-End Server webmail.afmc.af.mil 
 ([131.28.34.85]) with SMTP id 0A8791F116E for <jgarriss at mitre.org>; Tue, 
  4 Jun 2013 14:03:24 -0400 (EDT) 

<xs:element name="Received_Header" dfdl:initiator="Received:%WSP*;" 
dfdl:terminator="%CR;%LF">
  <xs:complexType>
    <xs:sequence dfdl:separator="%CR;%LF;%SP;" 
dfdl:separatorPosition="infix">
        <xs:element name="data" type="xs:string" maxOccurs="unbounded" 
dfdl:lengthKind="delimited" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

DFDL consumes the initiator then starts processing the content of the 
header as an array of records. The CR+LF+SP are consumed as the separator, 
because that is the longest match. The CR+LF (no SP) is consumed as the 
terminator of the header. Clearly that only works if there is no SP 
straight after the CR+LF for the last line of a header. So you don't need 
a discriminator. 

You will have to stitch the data together post-parse. I guess you could 
make the sequence hidden and get DFDL to stitch together the data lines 
into one long string via an element with dfdl:inputValueCalc. 

Ah - I think I see where Mike's earlier append to the mailing list was 
coming from ?

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   "Garriss Jr., James P." <jgarriss at mitre.org>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   05/06/2013 16:25
Subject:        Re: [DFDL-WG] Ignore extraneous CRLF w/ space?
Sent by:        dfdl-wg-bounces at ogf.org



> Is the problem that the dfdl:terminator '%CR;%LF;' for the end of the 
header record is firing prematurely when it encounters the CRLF in the 
data?
 
Exactly.
 
> I would model the data as unbounded repeating records, and use a 
discriminator to distinguish the repeats from the next header.
 
Uh, could you repeat that in English?  Maybe with a small example?  I 
freely admit that I don’t understand what you just said.  Thanks!
 
From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, June 05, 2013 5:21 AM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] Ignore extraneous CRLF w/ space?
 
James 

Is the problem that the dfdl:terminator '%CR;%LF;' for the end of the 
header record is firing prematurely when it encounters the CRLF in the 
data? 

If so then I'm not sure that DFDL can ignore the extra %CR;%LF; without 
using an escape scheme - but there isn't an escape scheme to use. 

I would model the data as unbounded repeating records, and use a 
discriminator to distinguish the repeats from the next header. 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        "Garriss Jr., James P." <jgarriss at mitre.org> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:        04/06/2013 19:56 
Subject:        [DFDL-WG] Ignore extraneous CRLF w/ space? 
Sent by:        dfdl-wg-bounces at ogf.org 




Long IMF headers, such as Received, can be wrapped onto the next line by 
using a CRLF and then a space.  This example has 3 such wrappings: 
  
Received: from smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1]) 
 by localhost (Postfix) via Exchange Front-End Server webmail.afmc.af.mil 
 ([131.28.34.85]) with SMTP id 0A8791F116E for <jgarriss at mitre.org>; Tue, 
  4 Jun 2013 14:03:24 -0400 (EDT) 
  
How do I get DFDL to ignore these wrappings?  For most of the header, it’s 
not an issue, because I can use a lengthPattern to lookahead to the ; 
before the date starts.  But once the date starts, I have no way of 
knowing when it ends, so I need to ignore any CRLF with a space. 
  
TIA 
  
 --
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130605/32f7f3ea/attachment-0001.html>


More information about the dfdl-wg mailing list