[DFDL-WG] Public comment 116 - Japanese CCSID 943

Steve Hanson smh at uk.ibm.com
Mon Oct 7 09:30:59 EDT 2013


Please see below discussion of the issue raised by public comment 116. 

While there is a simple workaround, the real issue is that ccsid 943 is in 
daily use in Japan for zoned decimals and DFDL does not support that.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 07/10/2013 14:05 -----

ICU library treats the ccsid '943' as ICU ibm-943_P130-1999 which is not 
100% ASCII compatible due to two code points being different - 0x5C and 
0x7E.  There is another encoding ICU ibm-943_P15A-2003 which is ASCII 
compatible, commonly called Shift_JIS.  The difference is one that you are 
probably familiar with - the backslash is replaced by the Yen symbol. Here 
is the extract from ICU converter site:



Internal
Converter Name
 IBM
IANA
ibm-943_P15A-2003
 
Shift_JIS
MS_Kanji
csShiftJIS
windows-31j
csWindows31J
ibm-943_P130-1999
ibm-943
 


This causes DFDL to reject the ccsid '943' when used on a zoned decimal on 
the grounds that it is not ASCII-compatible. Why does DFDL do this?  It's 
because it is being safe. So far, we have identified 4 different 
'overpunching' schemes for ASCII zoned decimals, and there might well be 
one or two more used by less common machine architectures:

asciiStandard: ASCII characters '0123456789' (0x30-0x39) and 'pqrstuvwxy' 
(0x70-0x79) for negative sign punch.
asciiTranslatedEBCDIC:  ASCII characters '{ABCDEFGHI' (0x7B, 0x41-0x49) 
and '}JKLMNOPQR' (0x7D, 0x4A-0x52) for negative sign punch. 
asciiCARealiaModified: ASCII characters '0123456789' (0x30-0x39) and 
'<SP>!"#$%&'()' (0x20-0x29) for negative sign punch.
asciiTandemModified: ASCII characters '0123456789' (0x30-0x39) and control 
characters 0x80-0x89 for negative sign punch.
In case other schemes are discovered that use different byte range for 
negative sign punch (eg, 0x50 to 0x59), the DFDL specification has said 
that ASCII zoned decimals must be in a 100% ASCII compatible encoding.

What we can observe is that ibm-943_P130-1999 is actually safe for 
representing ASCII zoned decimals in all the above schemes, because the 
0x5C and 0x7E characters do not match any of the ranges of bytes used by 
the schemes. And we can also observe that (apart from the special case of 
asciiTranslatedEBCDIC) the overpunching schemes simply use the bits in the 
first nibble of the byte, so any new scheme we discover is very unlikely 
to affect 0x5C or 0x7E. 

So, the DFDL specification *could* be changed to treat ibm-943_P130-1999 
as ASCII compatible for zoned decimals. 

The alternative is to use a work around whereby in a DFDL schema that 
models a data stream in ccsid 943, the default is 943 but zoned decimals 
override this and use Shift_JIS. When this workaround was discussed with a 
Japanese user of IBM DFDL, the reaction was "Why do I have to go to that 
trouble? It should just work in 943."

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20131007/26842af2/attachment.html>


More information about the dfdl-wg mailing list