[DFDL-WG] DFDL Decimal - final proposal

Wed Jul 16 09:45:02 CDT 2008

I looked at numberCheckPolicy in 032 draft.

I agree that if we're going to use this for both packed and text to control
the issue of multiple reps of zero being allowed then we need to pull it out
of the defineTextNumberFormat object.

However, this still leaves ambiguous what the unparsed representation of
zero should be. 

We can either say that the strict form is what is used for zero, or we have
to give a separate property which specifies what the rep of zero is. 

It is conservative to just use the strict form. We can add the property
later if it is needed.

  _____  

From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf Of
Steve Hanson
Sent: Wednesday, July 16, 2008 9:08 AM
To: dfdl-wg at ogf.org
Subject: [DFDL-WG] DFDL Decimal - final proposal

Here's the revised decimal supplement again for final approval. Please can
we discuss on the call today for inclusion in draft 33. 

This has been updated to reflect the debate below around properties
dfdl:decimalFormat and dfdl:integerFormat (because either could be used with
xs:int and xs:decimal, and at runtime the parser does not know which one to
apply). So dfdl:decimalFormat has been removed, and replaced by
dfdl:numberFormat - defined below. 

Property Name 

Description 

numberFormat 

String 

Valid values are 'text', 'zoned', 'packed', 'BCD', 'twosComplement' 

When the representation is 'text' then the allowable values are 'text' and
'zoned'. 

When the representation is 'binary' then the allowable values are 'packed',
'BCD' and 'twosComplement'.

I'd also like to propose that we rename dfdl:defineNumberFormat to
dfdl:defineTextNumberFormat, to prevent confusion. 

The other change is around the packed decimal convention, sometimes used,
that zero is indicated by all bytes being hex zero, even though this is not
technically a valid packed decimal number. I had said that on parsing,
whether to tolerate this is governed by the numberCheckPolicy property, and
on unparsing, this convention is not used. That won't work because we are
talking about (binary) packed decimals and numberCheckPolicy is a property
within (text) dfdl:defineNumberFormat. One solution is to move
numberCheckPolicy outside of dfdl:defineNumberFormat and have it apply to
both text and binary numbers. 

However it can be observed that numberCheckPolicy is getting rather bloated
and is covering several behaviours. There's yet another behaviour that could
be added - the TX team review want a dfdl:defineNumberFormat property called
numberZeroRep to handle special zero representations. That's fine - but on
parsing whether to allow just the zero rep or both the rep and '0' is a
requirement from TX - which we could accomodate by extensing
numberCheckPolicy. Question is, are we overloading numberCheckPolicy, or is
it time to make it more granular? 

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 16/07/2008 12:15 ----- 

Steve Hanson/UK/IBM 

09/04/2008 15:44 

To

<mbeckerle.dfdl at gmail.com> 

cc

dfdl-wg at ogf.org 

Subject

RE: Fw: DFDL Decimal - proposal - correcting wrong attachmentLink
<Notes://D06ML070/802563ED00496501/38D46BF5E8F08834852564B500129B2C/5BC0C075
9EDCE72080257426004D8748> 

Hi Mike - answers in-line below. 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 

"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 

09/04/2008 15:05 

Please respond to
<mbeckerle.dfdl at gmail.com>

To

Steve Hanson/UK/IBM at IBMGB 

cc

<dfdl-wg at ogf.org> 

Subject

RE: Fw: DFDL Decimal - proposal - correcting wrong attachment

Thanks for these clarifications. 

Do we have a way to represent "unpacked" decimal numbers. This is like
zoned, except the "zones" are zero instead of "F" (in ebcdic encodings). 
<smh>No we don't. Neither MRM nor TX support that. Have you seen such an
example?  Is it encoding sensitive? 

Also, can a BCD number have a sign? 
<smh>What we are calling a BCD can not have a sign, as far as I know. That's
where packed decimal comes in. 

.mikeb 

  _____  

From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, April 09, 2008 10:00 AM
To: mbeckerle.dfdl at gmail.com
Cc: 'Mike Beckerle'; Alan Powell; Ian W Parkinson
Subject: RE: Fw: DFDL Decimal - proposal - correcting wrong attachment 

Hi Mike - answers in-line below. 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 

"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 

09/04/2008 01:43 

Please respond to
<mbeckerle.dfdl at gmail.com>

To

Steve Hanson/UK/IBM at IBMGB, Alan Powell/UK/IBM at IBMGB 

cc

Ian W Parkinson/UK/IBM at IBMGB, 'Mike Beckerle' 

Subject

RE: Fw: DFDL Decimal - proposal - correcting wrong attachment

I prefer one property dfdl:numberFormat, the valid values of which depend on
dfdl:representation 
<smh>The advantage of two properties is that you can set scoping for text
and binary numbers separately. 

I like the analysis that text formats are ones which depend on encoding, and
not byteOrder, and binary depend on byte order, and NOT encoding. 
<smh>Me too. 

There's also format specifiers for floating point. Should those also go on
here, be allowed only for representation="binary"? 
<smh>I did think about this, but I think we are better off keeping floats
separate. Otherwise people might think you can declare a logical float to be
rep'd by physical integer. MRM allows this, and I wish it didn't. It also
exacerbates the problem noted above - I couldn't set a default float format,
which is something that would almost certainly never vary within a data
stream. 

The rest of the proposal looks fine. I found decimalVirtualPoint an odd
name, but it is clear and obeys the conventions. 
<smh>I agree it's a bit odd. An alternative is 'decimalimpliedPlaces' which
uses TX terminology - but that doesn't match the 'V' pattern character we
are proposing in the ICU pattern (which matches COBOL) 

I was a bit unclear on how do you represent an unsigned packed decimal. This
is common. There is no sign nibble at all. It lets you do an even number of
digits. MMDDYY is commonly this, 3 unsigned packed numbers. 
<smh>What you have described is dfdl:numberFormat="BCD". An unsigned packed
decimal is dfdl:numberFormat="packed" with the sign nibble always unsigned,
so dfdl:packedDecimalSignCodes="F F F". 

.mikeb 

  _____  

From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, April 02, 2008 11:54 AM
To: Alan Powell
Cc: Ian W Parkinson; Mike Beckerle
Subject: Re: Fw: DFDL Decimal - proposal - correcting wrong attachment 

Alan, Ian and myself reviewed this today. 

The main issue was that the loss of dfdl:representation="decimal" means that
it is no longer clear when to use dfdl:integerFormat and dfdl:decimalFormat,
because an xs:decimal can have a binary integer rep and an xs:int can have a
binary decimal rep. It was noted that both IBM models (MRM and TX type tree)
handle this by having a single property. I don't want to re-introduce
rep=decimal, I think we shoiuld stick with text (implying encoding
sensitive) and binary (potentially byte order sensitive). Options: 

a) One property dfdl:numberFormat with values "text", "zoned", "packed",
"BCD", "twosComplement", "onesComplement", "signMagnitude". 
- "text" and "zoned" when dfdl:representation="text" 
- "packed", "BCD", "twosComplement", "onesComplement", "signMagnitude" when
dfdl:representation="binary" 

Number        xs:int, xs:decimal                text =>        numberFormat 

              xs:float, xs:double                text => 

              xs:int, xs:decimal                binary =>
numberFormat 

              xs:float                                binary =>
floatFormat 

b) Two properties dfdl:textNumberFormat and dfdl:binaryNumberFormat,
allowable enums split as above. 
- this means the existing dfdl:textNumberFormat property gets renamed to
dfdl:textNumberPattern or dfdl:textNumberScheme 

Number        xs:int, xs:decimal                text =>
textNumberFormat 

              xs:float, xs:double                text =>                 

              xs:int, xs:decimal                binary =>
binaryNumberFormat 

              xs:float                                binary =>
floatFormat 

Other suggestions? 

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 

Alan Powell/UK/IBM 

28/03/2008 16:45 

To

Steve Hanson/UK/IBM at IBMGB 

cc

Ian W Parkinson/UK/IBM at IBMGB, mbeckerle at oco-inc.com 

Subject

Re: Fw: DFDL Decimal - proposal - correcting wrong attachmentLink
<Notes://d06ml070/802563ED00496501/38D46BF5E8F08834852564B500129B2C/378267E0
649073098025741A004CDBF1> 

Steve 

Technically seems OK. 

Need quite a bit of editorial work before it can be included in the spec
which I have started. 

Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com  
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898 

From: 

Steve Hanson/UK/IBM 

To: 

mbeckerle at oco-inc.com 

Cc: 

Alan Powell/UK/IBM, Ian W Parkinson/UK/IBM 

Date: 

28/03/2008 13:59 

Subject: 

Fw: DFDL Decimal - proposal - correcting wrong attachment

  _____  

Here's an attempt at a revised decimal supplement, that takes into account
the stuff in my mail below. 

[attachment "ggf-dfdl-supplement-advanced-decimal-properties-v1.0-003.doc"
deleted by Alan Powell/UK/IBM] 

Some discussion points: 

1) I've removed the representation 'Decimal' - a decimal is either 'Text' or
'Binary'.  Property decimalFormat says whether it is text or zoned (for
text) or packed or BCD (for binary). 

2) There's no need for a decimalSigned property, as zoned uses numberPattern
for this, BCD is always unsigned, and packed indicates this via sign code 

3) I've added VDP property for BCD and packed - zoned uses numberPattern for
this. However,  VDP property is also needed for binary integers - this is
missing from spec. COBOL PIC 99V99 COMP will create an xs:decimal with
binary integer rep, so we need to support this. I suggest we have a single
VDP property that applies to all binary reps that can be used to represent
xs:decimal. So my VDP property gets removed to main spec. 

4) The resultant properties are less than before. I'm not sure that a
separate supplement is justified. 

5) I would like to remove numberCheckPolicy from dfdl:DefineNumberFormat,
and make it a separate property. Two reasons: 
- I think the decision to use strict/lax checking is not an attribute of the
number format but more an attribute of the schema as a whole. 
- It means we can control packed decimal sign nibble oddities with the same
property as other strict/lax number checking, 

Let's review on next OGF WG call. 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 28/03/2008 12:33 ----- 

Steve Hanson/UK/IBM 

27/03/2008 15:29 

To

Mike Beckerle (Work) 

cc

Subject

DFDL Decimal - proposal

Hi Mike 

I've finally got round to looking at the decimal supplement, and I'd like to
get your opinion on something. The WTX team have been reviewing draft 031
and had the following observation (actually they had quite a few good ones,
and when they've finished we need to discuss them all on a OGF WG call). 

"13.3. Is a zoned decimal textual or non-textual?  If all overpunched
variants result in well-known characters then the data is scannable and
therefore more like a textual field." 

It turns out that the type hierarchy in TX for decimal looks like below.
They consider Zoned as text as it always consists of reasonable characters
and is subject to encoding conversion, padding, justification, etc. There's
a lot of appeal in that. It's always bothered me a bit that MRM viewed it as
a binary type. 

Number -> Character -> Decimal (meaning text decimal) 
                     Integer (meaning text integer) 
                     Zoned 
     -> Binary    -> Integer (meaning binary integer) 
                     Float 
                     Packed 
                     BCD 

Also, their Zoned does not have separate sign option. They point out that a
separate signed Zoned is just a Text decimal. And they are correct. We got
the separate sign thing from MRM, which after some digging turns out to have
got it from the CAM Type Descriptor model, which had no other way of
representing a text decimal number with a separate sign. 

As part of my rework of the decimal supplement, I'd like to take both these
into account. The implications are: 
- Zoned => overpunched only 
- Zoned decimal can pick up on the textNumberxxx properties, including
textNumberFormat 
     => use the numberPattern (ie, ICU pattern) property to say which end
the (overpunched) sign goes 
     => can get away without a separate pattern language for binary
decimals, which as you point out has endian-ness issues 
- Binary decimals are packed and BCD 
- There are a lot fewer properties for decimals 
- dfdl:representation = "text" can have subdivisions - that's not occurred
until now (we could think about making dfdl:representation = "xml" a
subdivision of "text"?) 

If you think there is merit in this approach then let me know by return and
I'll see if I can write something up tomorrow. 
I'm WAH on +44-1794-340899 if you want to discuss. 

Your "crazy idea" below is interesting - but I think is a tooling thought
rather than a core spec thing. 

(Sorry about call yesterday - I thought I mailed something out a couple of
calls ago about DST mismatch, but perhaps I didn't). 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 27/03/2008 15:04 ----- 

Mike Beckerle/Worcester/IBM at IBMUS 

21/11/2007 15:26 

To

Steve Hanson/UK/IBM at IBMGB 

cc

DFDL-Technical-Core, Suman Kalia/Toronto/IBM at IBMCA 

Subject

DFDL Decimal - was Re: DFDL & length prefixes - proposalLink
<Notes://d06ml070/802563ED00496501/38D46BF5E8F08834852564B500129B2C/95CF5EF5
23B34A158025739A0045C41B> 

I think decimal has signed and unsigned variants based on dfdl:decimalSigned
boolean. If this is false then it's unsigned and packedUnsignedRep specifies
the sign nibble used for unsigned. The doc doesn't specify that one can say
"" for this indicating no sign nibble at all. 

I've been rereading the decimal properties supplement and starting v002 of
it based on changes to dfdl:representation in the core spec. This needs a
general clean up. There's errors here in that there is a
decimalType="zoned", or "packed" or "BCD" and also a bcdIsPacked, and
bcdUnpackedRep="ebcdic", which is the same as zoned I think. 

We need there to be one way to express these things. Right now the bias is a
set of orthogonal flags: signed or unsigned, what's the sign nibble for
unsigned, what sign nibbles for signed, packed or unpacked, what's in the
zones - the unused nibbles -  (ebcdic, i.e., "F", ascii, i.e., "3", or zero
- but that's not enough as I've seen data with "2" in the zones - some non
IBM cobol compiler does this.). 

A better choice may be to specify decimalType as a larger enum which
includes most of these properties, so that we don't end up with too much
ability to express variants that have simply never existed. 

A list of the use cases needs to be added to the doc also. 

Here's a few: 

-1234 as expressed as bytes in hex in increasing position order, i.e., LSB
first. 

packed ibm, signed, D01234 

zoned ibm, overpunched leading sign D1F2F3F4 (are signs usually leading or
trailing.... I think trailing actually.) 

big endian zoned ascii, ascii-translated overpunched leading sign  4A323334
(yuck - so much for treating decimal as "binary" data). 

Here's a crazy idea: I believe there is a set of magic numbers which if you
give me their translations in bytes, I can determine exactly what the
encoding properties are. 

E.g., if you give me the bytes for  +0000, -1234, +789 I believe I can
determine all of the properties. 

This might be a better way to specify decimal formats. I.e., give me those
byte patterns expressed as hex, and I reverse engineer all the property
settings. 

e.g., decimalFormat="+0000=C00000-1234=D01234 +789=C789" (signed, packed,
leading sign, padded to even number of nibbles, big endian, zero carries a
sign, "C" is plus, "D" is minus) 
or decimalFormat="+0000=00000000 -1234=D1F2F3F4 +789=C7F8F9" (ebcdic zoned,
leading overpunched sign, big endian, zero is allowed to have zero as sign
and all zero bytes, "C" is plus, "D" is minus) 

This may make more sense for the tooling than the DFDL language though.
I.e., point it at some data and it tries to guess these properties. 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan   
               priordan at us.ibm.com 
               508-599-7046

  _____  

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

  _____  

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

  _____  

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

  _____  

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080716/e21227d2/attachment-0001.html