[DFDL-WG] Proposed Errata Language: Issue - clarify that our infoset values are not exactly ISO 10646, but are a superset

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Jan 16 17:26:10 EST 2012


Proposed Errata Language:

*The definition of [dataValue] in section 4.1.2 is changed to the following
(note the first sentence is unchanged)

[dataValue] The value in the value space (as defined by XML Schema Part 2:
Datatypes <http://www.w3.org/TR/xmlschema-2/> [XSDLV1] ) of the [datatype]
member or special value nil. In a complex element information item this
member has no value. *

* For information items of datatype xs:string, the value is an ordered
collection of unsigned 16-bit integer codepoints each having any value from
0x0000 to 0xFFFF. Where defined, these are interpreted as the ISO646
character codes. Codepoints disallowed by ISO 10646, such as 0xD800 to
0xDFFF are explicitly allowed by the DFDL infoset. The codepoints of the
string are stored in 'implicit' (also known as logical), left-to-right
bidirectional ordering and orientation. DFDL's infoset represents Unicode
characters with character codes beyond 0xFFFF by way of surrogate pairs (2
adjacent codepoints) in a manner consistent with the UTF-16 encoding of ISO
10646.
*

-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120116/35ed05fc/attachment.html>


More information about the dfdl-wg mailing list