[DFDL-WG] Fw: Action 167: textNumberPatterns with P, V, # - allowable combinations
Steve Hanson
smh at uk.ibm.com
Thu Apr 5 13:02:46 EDT 2012
Agreed on DFDL WG call 5th April 2012:
A pattern with a V symbol must not have # symbols to the right of the V
symbol.
A pattern with P symbols at the left end must have no # symbols in the
pattern.
A pattern with P symbols at the right end has no restrictions.
ICU padding and DFDL padding/trimming are independent mechanisms, may be
used in conjunction.
When ICU padding is used in pattern (* symbol) it is a schema definition
error if P or V symbol in pattern.
When ICU significant digits in pattern (@ symbol) it is a schema
definition error if P or V symbol in pattern.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org
Cc: Richard Schofield/UK/IBM at IBMGB
Date: 22/03/2012 11:01
Subject: Fw: Action 167: textNumberPatterns with P,V, # -
allowable combinations
Further to discussion on the call, here is what IBM COBOL manual says
about PIC P.
An assumed decimal scaling position. It is used to specify the location of
an assumed decimal point when the point is not within the number that
appears in the data item. The scaling position character P is not counted
in the size of the data item. Scaling position characters are counted in
determining the maximum number of digit positions (63) in numeric-edited
items or in items that appear as arithmetic operands. The scaling position
character P may appear only as a continuous string of Ps in the leftmost
or rightmost digit positions within a PICTURE character-string. Because
the scaling position character P implies an assumed decimal point (to the
left of the Ps, if the Ps are leftmost PICTURE characters; to the right of
the Ps, if the Ps are rightmost PICTURE characters), the assumed decimal
point symbol, V, is redundant as either the leftmost or rightmost
character within such a PICTURE description.
In certain operations that reference a data item whose PICTURE
character-string contains the symbol P, the algebraic value of the data
item is used rather than the actual character representation of the data
item. This algebraic value assumes the decimal point in the prescribed
location and zero in place of the digit position specified by the symbol
P. The size of the value is the number of digit positions represented by
the PICTURE character-string. These operations are any of the following:
Any operation requiring a numeric sending operand.
A MOVE statement where the sending operand is numeric and its PICTURE
character-string contains the symbol P.
A MOVE statement where the sending operand is numeric-edited and its
PICTURE character-string contains the symbol P and the receiving operand
is numeric or numeric-edited.
A comparison operation where both operands are numeric.
In all other operations the digit positions specified with the symbol P
are ignored and are not counted in the size of the operand.
This implies that the scaling should be applied as a lexical operation on
the data. In other words two COBOL fields, one with PIC PP9 and value '2'
and one with PP999 and value '002' do not result in the same logical
number.
There is an equivalence between V and P. PP999 == V99999 and 999PP ==
99999V == 99999. If we consider things in these terms the reasoning is
simpler. To prevent # symbol zero suppression from changing the value,
rule a) must apply and there must be no # to the right of the V. That
restates our rules as:
a) A pattern with a V symbol must not have # symbols to the right of the V
symbol.
b) A pattern with P symbols at the left end must have no # symbols in the
pattern.
c) A pattern with P symbols at the right end has no restrictions.
There is another problem though. The number can be trimmed using the pad
character from either or both ends depending on justification, before
applying the number pattern. If the pad character is 0 then this can also
cause 0's to be lost and result in mis-application of V and P symbols. I'm
not sure there is much we can do about this. Modelers need to be careful
when padding/trimming that they get the justification correct. For
example, we typically think of numbers as being right justified, but for a
number with Ps on the left, it is effectively left justified and should be
modeled as such. We added errata 2.25 which prevented trimming from
leaving an empty string. I am thinking that this errata should actually
say that trimming must leave at least the minimum number of digits implied
by the pattern, as an extra safeguard? We mustn't disallow
trimming/padding altogether as it is used to remove spaces.
The ICU pad character symbol * is used to provide a pad character when the
data is shorter than the pattern. This is only used to pad when unparsing,
it is not used to trim. But it might be safer to disallow P and V symbols
when * is used?
Reading the ICU description of significant digit symbol @, explicit
decimal points are disallowed. I think we should disallow P and V symbols
when @ symbol is used. Errata 2.28 should be updated.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 22/03/2012 09:22 -----
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org
Date: 21/03/2012 10:34
Subject: Action 167: textNumberPatterns with P,V, # - allowable
combinations
Last week we agreed that disallowing text number patterns that contained a
# symbol and either a P or V symbol was too restrictive. Accordingly the
following rules are proposed to control when # may be used in the same
pattern as P or V to ensure an unambiguous pattern.
a) Pattern must not have # symbols to the right of the V symbol.
b) If pattern has P symbols at the left end, then there must be as many 0
symbols adjacent to the rightmost P symbol as there are P symbols.
c) If pattern has P symbols at the right end, there are no restrictions.
If a) or b) are violated it is a schema definition error.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120405/1d9c191a/attachment.html>
More information about the dfdl-wg
mailing list