[DFDL-WG] ***SPAM*** RE: Fw: DFDL Decimal - proposal - correcting wrong attachment

Steve Hanson smh at uk.ibm.com
Wed Apr 9 09:50:08 CDT 2008


Hi Mike - answers in-line below. 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848



"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 
09/04/2008 15:05
Please respond to
<mbeckerle.dfdl at gmail.com>


To
Steve Hanson/UK/IBM at IBMGB
cc
<dfdl-wg at ogf.org>
Subject
RE: Fw: DFDL Decimal - proposal - correcting wrong attachment






Thanks for these clarifications.
 
Do we have a way to represent ?unpacked? decimal numbers. This is like 
zoned, except the ?zones? are zero instead of ?F? (in ebcdic encodings).
<smh>No we don't. Neither MRM nor TX support that. Have you seen such an 
example?  Is it encoding sensitive? 

Also, can a BCD number have a sign?
<smh>What we are calling a BCD can not have a sign, as far as I know. 
That's where packed decimal comes in.
 
?mikeb
 

From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, April 09, 2008 10:00 AM
To: mbeckerle.dfdl at gmail.com
Cc: 'Mike Beckerle'; Alan Powell; Ian W Parkinson
Subject: RE: Fw: DFDL Decimal - proposal - correcting wrong attachment
 

Hi Mike - answers in-line below. 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 


"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 
09/04/2008 01:43 


Please respond to
<mbeckerle.dfdl at gmail.com>



To
Steve Hanson/UK/IBM at IBMGB, Alan Powell/UK/IBM at IBMGB 
cc
Ian W Parkinson/UK/IBM at IBMGB, 'Mike Beckerle' 
Subject
RE: Fw: DFDL Decimal - proposal - correcting wrong attachment
 


 
 




I prefer one property dfdl:numberFormat, the valid values of which depend 
on dfdl:representation 
<smh>The advantage of two properties is that you can set scoping for text 
and binary numbers separately. 
  
I like the analysis that text formats are ones which depend on encoding, 
and not byteOrder, and binary depend on byte order, and NOT encoding. 
<smh>Me too. 
  
There?s also format specifiers for floating point. Should those also go on 
here, be allowed only for representation=?binary?? 
<smh>I did think about this, but I think we are better off keeping floats 
separate. Otherwise people might think you can declare a logical float to 
be rep'd by physical integer. MRM allows this, and I wish it didn't. It 
also exacerbates the problem noted above - I couldn't set a default float 
format, which is something that would almost certainly never vary within a 
data stream. 
  
The rest of the proposal looks fine. I found decimalVirtualPoint an odd 
name, but it is clear and obeys the conventions. 
<smh>I agree it's a bit odd. An alternative is 'decimalimpliedPlaces' 
which uses TX terminology - but that doesn't match the 'V' pattern 
character we are proposing in the ICU pattern (which matches COBOL) 
  
I was a bit unclear on how do you represent an unsigned packed decimal. 
This is common. There is no sign nibble at all. It lets you do an even 
number of digits. MMDDYY is commonly this, 3 unsigned packed numbers. 
<smh>What you have described is dfdl:numberFormat="BCD". An unsigned 
packed decimal is dfdl:numberFormat="packed" with the sign nibble always 
unsigned, so dfdl:packedDecimalSignCodes="F F F". 

  
?mikeb 
  
  
  
 


From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, April 02, 2008 11:54 AM
To: Alan Powell
Cc: Ian W Parkinson; Mike Beckerle
Subject: Re: Fw: DFDL Decimal - proposal - correcting wrong attachment 
  

Alan, Ian and myself reviewed this today. 

The main issue was that the loss of dfdl:representation="decimal" means 
that it is no longer clear when to use dfdl:integerFormat and 
dfdl:decimalFormat, because an xs:decimal can have a binary integer rep 
and an xs:int can have a binary decimal rep. It was noted that both IBM 
models (MRM and TX type tree) handle this by having a single property. I 
don't want to re-introduce rep=decimal, I think we shoiuld stick with text 
(implying encoding sensitive) and binary (potentially byte order 
sensitive). Options: 

a) One property dfdl:numberFormat with values "text", "zoned", "packed", 
"BCD", "twosComplement", "onesComplement", "signMagnitude". 
- "text" and "zoned" when dfdl:representation="text" 
- "packed", "BCD", "twosComplement", "onesComplement", "signMagnitude" 
when dfdl:representation="binary" 

Number        xs:int, xs:decimal                text => numberFormat 

               xs:float, xs:double                text => 

               xs:int, xs:decimal                binary => numberFormat 

               xs:float                                binary => 
floatFormat 


b) Two properties dfdl:textNumberFormat and dfdl:binaryNumberFormat, 
allowable enums split as above. 
- this means the existing dfdl:textNumberFormat property gets renamed to 
dfdl:textNumberPattern or dfdl:textNumberScheme 

Number        xs:int, xs:decimal                text => textNumberFormat 

               xs:float, xs:double                text =>                 
 
               xs:int, xs:decimal                binary => 
binaryNumberFormat 

               xs:float                                binary => 
floatFormat 

Other suggestions? 

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 

Alan Powell/UK/IBM 
28/03/2008 16:45 
 


To
Steve Hanson/UK/IBM at IBMGB 
cc
Ian W Parkinson/UK/IBM at IBMGB, mbeckerle at oco-inc.com 
Subject
Re: Fw: DFDL Decimal - proposal - correcting wrong attachmentLink

 
 


 
 




Steve 

Technically seems OK. 

Need quite a bit of editorial work before it can be included in the spec 
which I have started. 


Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

From: 
Steve Hanson/UK/IBM 
To: 
mbeckerle at oco-inc.com 
Cc: 
Alan Powell/UK/IBM, Ian W Parkinson/UK/IBM 
Date: 
28/03/2008 13:59 
Subject: 
Fw: DFDL Decimal - proposal - correcting wrong attachment

 
 




Here's an attempt at a revised decimal supplement, that takes into account 
the stuff in my mail below. 

[attachment "ggf-dfdl-supplement-advanced-decimal-properties-v1.0-003.doc" 
deleted by Alan Powell/UK/IBM] 

Some discussion points: 

1) I've removed the representation 'Decimal' - a decimal is either 'Text' 
or 'Binary'.  Property decimalFormat says whether it is text or zoned (for 
text) or packed or BCD (for binary). 

2) There's no need for a decimalSigned property, as zoned uses 
numberPattern for this, BCD is always unsigned, and packed indicates this 
via sign code 

3) I've added VDP property for BCD and packed - zoned uses numberPattern 
for this. However,  VDP property is also needed for binary integers - this 
is missing from spec. COBOL PIC 99V99 COMP will create an xs:decimal with 
binary integer rep, so we need to support this. I suggest we have a single 
VDP property that applies to all binary reps that can be used to represent 
xs:decimal. So my VDP property gets removed to main spec. 

4) The resultant properties are less than before. I'm not sure that a 
separate supplement is justified. 

5) I would like to remove numberCheckPolicy from dfdl:DefineNumberFormat, 
and make it a separate property. Two reasons: 
- I think the decision to use strict/lax checking is not an attribute of 
the number format but more an attribute of the schema as a whole. 
- It means we can control packed decimal sign nibble oddities with the 
same property as other strict/lax number checking, 

Let's review on next OGF WG call. 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 28/03/2008 12:33 ----- 

Steve Hanson/UK/IBM 
27/03/2008 15:29 
 


To
Mike Beckerle (Work) 
cc
 
Subject
DFDL Decimal - proposal

 
 


 
 




Hi Mike 

I've finally got round to looking at the decimal supplement, and I'd like 
to get your opinion on something. The WTX team have been reviewing draft 
031 and had the following observation (actually they had quite a few good 
ones, and when they've finished we need to discuss them all on a OGF WG 
call). 

"13.3. Is a zoned decimal textual or non-textual?  If all overpunched 
variants result in well-known characters then the data is scannable and 
therefore more like a textual field." 

It turns out that the type hierarchy in TX for decimal looks like below. 
They consider Zoned as text as it always consists of reasonable characters 
and is subject to encoding conversion, padding, justification, etc. 
There's a lot of appeal in that. It's always bothered me a bit that MRM 
viewed it as a binary type. 

Number -> Character -> Decimal (meaning text decimal) 
                      Integer (meaning text integer) 
                      Zoned 
      -> Binary    -> Integer (meaning binary integer) 
                      Float 
                      Packed 
                      BCD 

Also, their Zoned does not have separate sign option. They point out that 
a separate signed Zoned is just a Text decimal. And they are correct. We 
got the separate sign thing from MRM, which after some digging turns out 
to have got it from the CAM Type Descriptor model, which had no other way 
of representing a text decimal number with a separate sign. 

As part of my rework of the decimal supplement, I'd like to take both 
these into account. The implications are: 
- Zoned => overpunched only 
- Zoned decimal can pick up on the textNumberxxx properties, including 
textNumberFormat 
      => use the numberPattern (ie, ICU pattern) property to say which end 
the (overpunched) sign goes 
      => can get away without a separate pattern language for binary 
decimals, which as you point out has endian-ness issues 
- Binary decimals are packed and BCD 
- There are a lot fewer properties for decimals 
- dfdl:representation = "text" can have subdivisions - that's not occurred 
until now (we could think about making dfdl:representation = "xml" a 
subdivision of "text"?) 

If you think there is merit in this approach then let me know by return 
and I'll see if I can write something up tomorrow. 
I'm WAH on +44-1794-340899 if you want to discuss. 

Your "crazy idea" below is interesting - but I think is a tooling thought 
rather than a core spec thing. 

(Sorry about call yesterday - I thought I mailed something out a couple of 
calls ago about DST mismatch, but perhaps I didn't). 

Regards, Steve

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 27/03/2008 15:04 ----- 

Mike Beckerle/Worcester/IBM at IBMUS 
21/11/2007 15:26 
 


To
Steve Hanson/UK/IBM at IBMGB 
cc
DFDL-Technical-Core, Suman Kalia/Toronto/IBM at IBMCA 
Subject
DFDL Decimal - was Re: DFDL & length prefixes - proposalLink

 
 


 
 




I think decimal has signed and unsigned variants based on 
dfdl:decimalSigned boolean. If this is false then it's unsigned and 
packedUnsignedRep specifies the sign nibble used for unsigned. The doc 
doesn't specify that one can say "" for this indicating no sign nibble at 
all. 

I've been rereading the decimal properties supplement and starting v002 of 
it based on changes to dfdl:representation in the core spec. This needs a 
general clean up. There's errors here in that there is a 
decimalType="zoned", or "packed" or "BCD" and also a bcdIsPacked, and 
bcdUnpackedRep="ebcdic", which is the same as zoned I think. 

We need there to be one way to express these things. Right now the bias is 
a set of orthogonal flags: signed or unsigned, what's the sign nibble for 
unsigned, what sign nibbles for signed, packed or unpacked, what's in the 
zones - the unused nibbles -  (ebcdic, i.e., "F", ascii, i.e., "3", or 
zero - but that's not enough as I've seen data with "2" in the zones - 
some non IBM cobol compiler does this.). 

A better choice may be to specify decimalType as a larger enum which 
includes most of these properties, so that we don't end up with too much 
ability to express variants that have simply never existed. 

A list of the use cases needs to be added to the doc also. 

Here's a few: 

-1234 as expressed as bytes in hex in increasing position order, i.e., LSB 
first. 

packed ibm, signed, D01234 

zoned ibm, overpunched leading sign D1F2F3F4 (are signs usually leading or 
trailing.... I think trailing actually.) 

big endian zoned ascii, ascii-translated overpunched leading sign 4A323334 
(yuck - so much for treating decimal as "binary" data). 

Here's a crazy idea: I believe there is a set of magic numbers which if 
you give me their translations in bytes, I can determine exactly what the 
encoding properties are. 

E.g., if you give me the bytes for  +0000, -1234, +789 I believe I can 
determine all of the properties. 

This might be a better way to specify decimal formats. I.e., give me those 
byte patterns expressed as hex, and I reverse engineer all the property 
settings. 

e.g., decimalFormat="+0000=C00000-1234=D01234 +789=C789" (signed, packed, 
leading sign, padded to even number of nibbles, big endian, zero carries a 
sign, "C" is plus, "D" is minus) 
or decimalFormat="+0000=00000000 -1234=D1F2F3F4 +789=C7F8F9" (ebcdic 
zoned, leading overpunched sign, big endian, zero is allowed to have zero 
as sign and all zero bytes, "C" is plus, "D" is minus) 

This may make more sense for the tooling than the DFDL language though. 
I.e., point it at some data and it tries to guess these properties. 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                priordan at us.ibm.com 
                508-599-7046




 


 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 








 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 











Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080409/333425ae/attachment-0001.html 


More information about the dfdl-wg mailing list