[DFDL-WG] Action 14: Propose DFDL entity scheme

Ian W Parkinson PARKIW at uk.ibm.com
Tue Jan 22 09:29:36 CST 2008


Hi guys,

For reference, and from wikipedia (search "newline"):

"The Unicode standard addresses the problem by defining a large number of 
characters that conforming applications should recognize as line 
terminators:
 LF:    Line Feed, U+000A
 CR:    Carriage Return, U+000D
 CR+LF: CR followed by LF, U+000D followed by U+000A
 NEL:   Next Line, U+0085
 FF:    Form Feed, U+000C
 LS:    Line Separator, U+2028
 PS:    Paragraph Separator, U+2029"

... so I guess, during parse, any of these sequences should match %NL; 
(maybe excluding FF and PS as being more significant than a single new 
line?). I agree with Mike, for unparse we'd presumably need a new property 
to specify this.


Again, from wikipedia, this time regarding whitespace:

"In Unicode (Unicode Character Database) the following codepoints are 
defined as whitespace:
U0009-U000D (Control characters, containing TAB, CR and LF)
U0020 SPACE
U0085 NEL
U00A0 NBSP
U1680 OGHAM SPACE MARK
U180E MONGOLIAN VOWEL SEPARATOR
U2000-U200A (different sorts of spaces)
U2028 LSP
U2029 PSP
U202F NARROW NBSP
U205F MEDIUM MATHEMATICAL SPACE
U3000 IDEOGRAPHIC SPACE"

....so presumably &WSP; would match any of these characters on parse. What 
should it generate on unparse?

Cheers,

Ian

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK



From:
Alan Powell/UK/IBM at IBMGB
To:
"Mike Beckerle" <mbeckerle at OCO-INC.COM>
Cc:
dfdl-wg at ogf.org, DFDL-Technical-Core%IBMGB at uk.ibm.com
Date:
22/01/2008 14:41
Subject:
Re: [DFDL-WG] Action 14: Propose DFDL entity scheme




Hi Mike 

%NL;  is a single character <LF> on those target platforms where that is 
the convention or <CR><LF> on others, etc. This is intended to make it 
easier for the same dfdl schema to parse messages from different 
platforms. I know we avoided target platform in DFDL and was expecting 
that this would cause some debate. 

This will be a good discussion for tomorrow's call 

Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898



"Mike Beckerle" <mbeckerle at OCO-INC.COM> 
22/01/2008 13:29 


To
Alan Powell/UK/IBM at IBMGB, <DFDL-Technical-Core%IBMGB at uk.ibm.com>, 
<dfdl-wg at ogf.org> 
cc

Subject
RE: [DFDL-WG] Action 14: Propose DFDL entity scheme








  
Is the &NL; supposed to represent a single character? Or can it be a CRLF? 

  
There?s no notion of ?the target platform? in DFDL. We?ve specifically 
avoided this notion on purpose. So we need a separate property like 
newline=?&CR;&LF;? or newline=?&LF;? if we want &NL; to be meaningful, 
unless some other property is suitable. 
  
There are some other Unicode whitespace and Unicode line-ending 
characters. Do we want to include those in the definitions of WSP and NL ? 
I recall there are 4 line-endings in Unicode. 
  
?mikeb 
  
From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf 
Of Alan Powell
Sent: Tuesday, January 22, 2008 6:32 AM
To: DFDL-Technical-Core%IBMGB at uk.ibm.com; dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] Action 14: Propose DFDL entity scheme 
  

All 

Attached is the latest proposal for DFDL 'entities' 

The main changes are: 
- No longer using XML entities as this proved to not meet all the 
requirements 
- New generic mnemonics for <NL> and others to represent the NL on the 
target platform. 



Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898




  
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 










Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080122/9ab5ab57/attachment-0001.html 


More information about the dfdl-wg mailing list