[DFDL-WG] Trimming of a text number that's all zeros when the number pattern has a sign char at the end
Andrew Edwards
andy.edwards at uk.ibm.com
Thu Oct 30 12:43:19 EDT 2014
Hi all,
I've hit an interesting case revolving around trimming and number patterns
that doesn't seem quite sane to me.
Consider an element with the following properties:
textTrimKind='padChar'
textNumberPadCharacter='0'
textNumberPattern='0000+;0000-'
So we have the sign character at the end of the representation. Now,
imagine that the data being parsed is "0000+". The relevant rules from
the DFDL specification are:
Section 13.2 on textTrimKind
When 'padChar', the element is trimmed of the dfdl:textStringPadCharacter,
dfdl:textNumberPadCharacter, dfdl:textBooleanPadCharacter or
dfdl:textCalendarPadCharacter depending on the type of the element.
Section 13.6 on textNumberPadCharacter
When parsing, if the pad character is '0' and the SimpleContent region
consists entirely of '0' characters, then the last remaining '0' is not
trimmed and a single '0' is the result of the trimming. This rule also
applies when the pad character is a DFDL character entity equivalent to
'0'. This rule does not apply when the pad character is any other
character nor when a pad byte is specified.
Section 13.6.1
Describes all of the pattern syntax.
In our hypothetical case, the content region is not all zeros, as it ends
in '+'. This means that the rule in section 13.6 does not apply and we
only apply the rule in 13.2. This results in us trimming away all of the
zeros and ending up with '+'. This then doesn't parse as a number.
The problem seems to be that the rule in Section 13.6 doesn't take into
account that the suffix of the pattern can result in text in the content
region that isn't part of the digits of the number. Should the rule under
section 13.56 be something more like this...
When parsing, if the pad character is '0' and the SimpleContent region
consists entirely of '0' characters, or the SimleContent region consists
of a string of '0' characters followed by non-digit characters, then the
last remaining '0' is not trimmed and a single '0' is the result of the
trimming. This rule also applies when the pad character is a DFDL
character entity equivalent to '0'. This rule does not apply when the pad
character is any other character nor when a pad byte is specified.
Thoughts?
Andy
Andy Edwards - IBM Integration Bus - DFDL
Email:
andy.edwards at uk.ibm.com
Snail Mail:
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE3 V17
The Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20141030/e6624944/attachment.html>
More information about the dfdl-wg
mailing list