[DFDL-WG] Simplified Escape Scheme V3

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Apr 22 07:14:00 CDT 2009


 
My views on these 3 weighty issues:
 
1.Should data containing the escapeEscapeCharacater cause escaping to be
used if if so how should it be escaped.  
 
No. I think the EEC alone isn't an active character. it has to be followed
by the EC to be interpreted at all. That said, if the pair EEC EC appears in
the data, then yes, we must escape the EC, with another EEC, to avoid this
being misinterpreted at read time. Resulting in EEC EEC EC in the final data
stream that we output. When we read it, we get EEC (first EEC is not
followed by an EC, so it is literal), second EEC is followed by EC, so we
get a literal EC. 
 
Trick: if the EEC and EC are the same character, then you have to escape
both of them, with themselves... er ah. so, taking "\" as an example, if "\"
is in the data item, then we must output "\\", and if "\\" is in the data
item, then we must output "\\\\" <file://\\ >   (which for some reason
microsoft outlook keeps removing my surrounding quotes from... must be some
sort of escape sequence for them!)
 
The rule is consistent though. The above "trick" isn't really a special
case. Just apply the rule uniformly that if you find the EC, you must
precede it by EEC for output. 
 
2.Should we only look for escapeStartString at the beginning of the data  
 
I'd prefer that we respect them anywhere, but canonical form when generated
is at the beginning of the data. However, if we want to be more
restrictive/conservative for v1.0 I'm fine with that.
 
3.Property names (everyone has their own favourite so lets just pick one.) 
 
Don't care. (Recall - I wanted to call these things quoting schemes....) 


Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com  
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898




From: 	Steve Hanson/UK/IBM 

To: 	Alan Powell/UK/IBM at IBMGB 

Cc: 	dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 

Date: 	19/04/2009 12:24 

Subject: 	Re: [DFDL-WG] Simplified Escape Scheme V3

  _____  



Alan 

Comments: 

- I think escapeBlockStart and escapeBlockEnd are better names, that way you
can immediately see they are for use with escapeBlock. 

- escapeKind.  Clarification to escapeBlock parsing behaviour. "On parsing
the escapeStartString is removed from the beginning of the data and
escapeEndString is removed from end of the data and any
escapeEscapeCharacters are removed when they precede any other occurences of
the escapeEndString in the data." 

- extraEscapedCharacters. Clarification: "A space separated list of single
characters that must be escaped in addition to in-scope markup" 

- generateEscape. The behaviour when escapeKind = escapeCharacter and value
is 'always' is not defined. I would prefer that: 
a) The descriptions of 'whenNeeded' behaviour are moved into the escapeKind
property to keep all the rules in one place. 
b) generateEscape is renamed generateEscapeBlock and only applies to
escapeKind = escapeBlock, as that is only when it has an effect. 

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 




Alan Powell/UK/IBM at IBMGB 
Sent by: dfdl-wg-bounces at ogf.org 


17/04/2009 15:22 


To
dfdl-wg at ogf.org 

cc

Subject
[DFDL-WG] Simplified Escape Scheme V3

	





Attached is the latest version of escape schemes. It includes Steve and
Mike's comments (although not renaming properties), removed escapeBlock2 and
added uses cases in section 5 which you might like to start with. 


The uses cases confirm that the syntax works with some minor clarifications
but highlights two questions: 
1.        Should data containing the escapeEscapeCharacater cause escaping
to be used if if so how should it be escaped. 
2.        Should we only look for escapeStartString at the beginning of the
data. 

 

Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com  
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898




  _____  




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 






[attachment "ggf-dfdl-simplified-escape-scheme-v3.doc" deleted by Alan
Powell/UK/IBM] --
 dfdl-wg mailing list
 dfdl-wg at ogf.org
  <http://www.ogf.org/mailman/listinfo/dfdl-wg>
http://www.ogf.org/mailman/listinfo/dfdl-wg 








  _____  





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 









-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090422/c053e8a4/attachment.html 


More information about the dfdl-wg mailing list