[DFDL-WG] Clarification/Issue - is UCS-2 an encoding

Steve Hanson smh at uk.ibm.com
Wed Dec 12 11:35:02 EST 2012


Agreed to remove all references to UCS-2.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   04/12/2012 18:21
Subject:        [DFDL-WG] Clarification/Issue - is UCS-2 an encoding
Sent by:        dfdl-wg-bounces at ogf.org




The encoding name "UCS-2" is used several times in the specification. 

This is not an IANA encoding name. What is intended is ISO-10646-UCS-2. A 
note in the IANA character set list notes that this encoding doesn't 
specify byte order but that it is needed. However, there are no endian 
variants for this encoding as there are with UTF-16 (UTF-16BE and 
UTF-16LE). 

I suggest we remove all mention of UCS-2 from the spec, and use UTF-16 
instead. The byte-order issues for UTF-16 are already discussed. 

This requires changing the description of utf16Width property as follows:

Section 11: utf16Width 

Replace paragraph begining with "Specifies..." by this:

Specifies whether the encoding 'UTF-16' should be treated as a fixed or 
variable-width encoding. 'UTF-16' is potentially a variable width encoding 
using either 2-byte codepoints, or matched surrogate-pair codepoints 
containing two adjacent 2-byte codepoints that are combined to compute the 
character code. However, it is historically common for users to specify 
'UTF-16' when they mean the fixed-width subset which does not allow use of 
the surrogate pairs. When utf16Width='variable', then surrogate pairs are 
expected and assumed to encode a single character. When 
utf16Width='fixed', then surrogate pair codepoints are treated as 
individual character codes. 

Section 12.3.7.1.1 Character Width

Remove "(e.g., UCS-2)" from table, replace with "(e.g., UTF-16 with 
dfdl:utf16Width='fixed')". 
Replace "variable (e.g., Shift_JIS, UTF-8, UTF-16)" with "variable (e.g., 
Shift_JIS, UTF-8, UTF-16 with dfdl:utf16Width='variable')"


-- 
Mike Beckerle | OGF DFDL WG Co-Chair | Tresys Technologies
Tel:  781-330-0412

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121212/db633952/attachment-0001.html>


More information about the dfdl-wg mailing list