[DFDL-WG] Action 14: Propose DFDL entity scheme v5
Alan Powell
alan_powell at uk.ibm.com
Wed Feb 6 18:01:49 CST 2008
All
Latest proposal incorporating comments
One question: should DFDL support standard XML entities? I have always
assumed so but it is not listed in supported XML schema functions.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell at uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
Ian W Parkinson/UK/IBM
22/01/2008 15:29
To
Alan Powell/UK/IBM at IBMGB
cc
DFDL-Technical-Core at IBMGB, dfdl-wg at ogf.org, "Mike Beckerle"
<mbeckerle at OCO-INC.COM>
Subject
Re: [DFDL-WG] Action 14: Propose DFDL entity scheme
Hi guys,
For reference, and from wikipedia (search "newline"):
"The Unicode standard addresses the problem by defining a large number of
characters that conforming applications should recognize as line
terminators:
LF: Line Feed, U+000A
CR: Carriage Return, U+000D
CR+LF: CR followed by LF, U+000D followed by U+000A
NEL: Next Line, U+0085
FF: Form Feed, U+000C
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029"
... so I guess, during parse, any of these sequences should match %NL;
(maybe excluding FF and PS as being more significant than a single new
line?). I agree with Mike, for unparse we'd presumably need a new property
to specify this.
Again, from wikipedia, this time regarding whitespace:
"In Unicode (Unicode Character Database) the following codepoints are
defined as whitespace:
U0009-U000D (Control characters, containing TAB, CR and LF)
U0020 SPACE
U0085 NEL
U00A0 NBSP
U1680 OGHAM SPACE MARK
U180E MONGOLIAN VOWEL SEPARATOR
U2000-U200A (different sorts of spaces)
U2028 LSP
U2029 PSP
U202F NARROW NBSP
U205F MEDIUM MATHEMATICAL SPACE
U3000 IDEOGRAPHIC SPACE"
....so presumably &WSP; would match any of these characters on parse. What
should it generate on unparse?
Cheers,
Ian
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
From:
Alan Powell/UK/IBM at IBMGB
To:
"Mike Beckerle" <mbeckerle at OCO-INC.COM>
Cc:
dfdl-wg at ogf.org, DFDL-Technical-Core%IBMGB at uk.ibm.com
Date:
22/01/2008 14:41
Subject:
Re: [DFDL-WG] Action 14: Propose DFDL entity scheme
Hi Mike
%NL; is a single character <LF> on those target platforms where that is
the convention or <CR><LF> on others, etc. This is intended to make it
easier for the same dfdl schema to parse messages from different
platforms. I know we avoided target platform in DFDL and was expecting
that this would cause some debate.
This will be a good discussion for tomorrow's call
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell at uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
"Mike Beckerle" <mbeckerle at OCO-INC.COM>
22/01/2008 13:29
To
Alan Powell/UK/IBM at IBMGB, <DFDL-Technical-Core%IBMGB at uk.ibm.com>,
<dfdl-wg at ogf.org>
cc
Subject
RE: [DFDL-WG] Action 14: Propose DFDL entity scheme
Is the &NL; supposed to represent a single character? Or can it be a CRLF?
There?s no notion of ?the target platform? in DFDL. We?ve specifically
avoided this notion on purpose. So we need a separate property like
newline=?&CR;&LF;? or newline=?&LF;? if we want &NL; to be meaningful,
unless some other property is suitable.
There are some other Unicode whitespace and Unicode line-ending
characters. Do we want to include those in the definitions of WSP and NL ?
I recall there are 4 line-endings in Unicode.
?mikeb
From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf
Of Alan Powell
Sent: Tuesday, January 22, 2008 6:32 AM
To: DFDL-Technical-Core%IBMGB at uk.ibm.com; dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] Action 14: Propose DFDL entity scheme
All
Attached is the latest proposal for DFDL 'entities'
The main changes are:
- No longer using XML entities as this proved to not meet all the
requirements
- New generic mnemonics for <NL> and others to represent the NL on the
target platform.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell at uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080207/ec4ad342/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFDL Entities v4.doc
Type: application/octet-stream
Size: 82944 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20080207/ec4ad342/attachment-0001.obj
More information about the dfdl-wg
mailing list