[DFDL-WG] Action 259 - Consider allowing more flexible escapeBlock schemes

Steve Hanson smh at uk.ibm.com
Tue May 20 12:58:12 EDT 2014


As discussed on the call, there is an import case that is not covered in 
the table, namely where quotes surround a delimiter but the opening quote 
is not at the start of the data. I imported the following text string into 
Excel:

        This is "," two separate fields

And indeed two columns were created, meaning the comma was treated as a 
delimiter and not escaped. This matches DFDL so good. 

Interestingly, the first column was as expected...

        This is "

...but the second was not:

        two separate fields

Notice the leading quote was removed without error, meaning that the 
absence of the closing quote is permitted!

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Tim Kimber/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org, 
Date:   13/05/2014 15:37
Subject:        Re: [DFDL-WG] Action 259 - Consider allowing more flexible 
escapeBlock     schemes
Sent by:        dfdl-wg-bounces at ogf.org



That looks fairly conclusive to me. DFDL should fall into line with 
established practice. 

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:        Steve Hanson/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        13/05/2014 11:50 
Subject:        [DFDL-WG] Action 259 - Consider allowing more flexible 
escapeBlock        schemes 
Sent by:        dfdl-wg-bounces at ogf.org 



Action 259 was raised last call to decide what to do about the following, 
as minuted: 

Steve has an example of an escape block where the escape block end is not 
at the end of the un-trimmed data. This gives a processing error. Another 
IBM product accepts this usage. Should DFDL allow this? Or should there be 
a new escapeKind that allows escapeBlockStart/End anywhere? 

Tried importing these values from a CSV file into an Excel spreadsheet, a 
Symphony spreadsheet (ie, successor to 123), and also accessing them via 
ODBC using a Microsoft driver, to compare with IBM DFDL and IBM Cast Iron 
behaviour. 
Test
Data 
IBM DFDL 
IBM Cast Iron 
MS Excel 
Lotus Symphony 
ODBC 







1
This is normal 
This is normal 
This is normal 
This is normal 
This is normal 
This is normal 
2
"This is OK" 
This is OK 
This is OK 
This is OK 
This is OK 
This is OK 
3
"This| is expected" 
This| is expected 
This| is expected 
This| is expected 
This| is expected 
This| is expected 
4
This too "is OK" 
This too "is OK" 
This too "is OK" 
This too "is OK" 
This too "is OK" 
This too 
5
Even "this" is OK 
Even "this" is OK 
Even "this" is OK 
Even "this" is OK 
Even "this" is OK 
Even 
6
"This" is NOT OK 
PARSE FAILED 
This is NOT OK 
This is NOT OK 
This is NOT OK 
This 
7
"This"" is still OK" 
This" is still OK 
This" is still OK 
This" is still OK 
This" is still OK 
This" is still OK



The data under discussion is 6. It looks like DFDL is out of step with the 
behaviour of Excel / Symphony spreadsheets, and Cast Iron has adopted that 
behaviour too. 

Out of interest I also checked the output behaviour from Excel. That 
escaped all instances of embedded quotes in the same way as DFDL, so no 
issues there. 

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140520/ea46399c/attachment.html>


More information about the dfdl-wg mailing list