[DFDL-WG] Trimming of a text number that's all zeros when the number pattern has a sign char at the end

Andrew Edwards andy.edwards at uk.ibm.com
Mon Nov 3 13:10:54 EST 2014


Hi Steve

Yep - I agree with your new definition of the rule taking into account 
justification-independence and quoted text.

I'm not sure I like the nested if-else in the new text though, as it took 
a couple of reads to understand.  How about the text below?  It splits out 
the second 'if/else', which I find easier to understand.

When parsing, if the pad character is '0' and dfdl:textTrimKind is 
'padChar' then the SimpleContent region is trimmed of the '0' characters 
as defined by the trimming rules.  If this trimming results in the next 
character in the SimpleContent region being a character other than a 
digit, the last '0' character is re-instated and not trimmed.  This rule 
also applies when the pad character is a DFDL character entity equivalent 
to '0'. This rule does not apply when the pad character is any other 
character nor when a pad byte is specified.

Cheers,
Andy 
Andy Edwards - IBM Integration Bus - DFDL


Email:
andy.edwards at uk.ibm.com
Snail Mail: 
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE3 V17

The Feynman problem solving Algorithm
  1) Write down the problem
  2) Think real hard
  3) Write down the answer
 -- Murray Gell-mann in the NY Times




From:   Steve Hanson/UK/IBM
To:     Andrew Edwards/UK/IBM at IBMGB
Cc:     DFDL-WG <dfdl-wg at ogf.org>
Date:   03/11/2014 15:39
Subject:        Re: [DFDL-WG] Trimming of a text number that's all zeros 
when the number pattern has a sign char at the end


Andy

I agree that the existing words do not cover all the scenarios. Your 
proposed words are on the right track but only cover left trimming a right 
justified text number. We need something that is independent of 
justification and can handle patterns where there is quoted text as well 
as signs. 

One can envisage some bizarre scenarios. Eg, Text number pattern is 
"#0'000'" - an attempt to divide by 1000 using the pattern. DFDL parser 
would trim everything except 1 zero which would not match the pattern 
which expects at least 3 zeros.  Trimming happens before pattern is looked 
at so I don't think we could cater for this (if we even wanted to).

Perhaps we should say:

When parsing, if the pad character is '0' and dfdl:textTrimKind is 
'padChar' then if the SimpleContent region is trimmed so that the removal 
of a '0' character leaves the next character other than a digit, the last 
'0' character is re-instated and not trimmed.  This rule also applies when 
the pad character is a DFDL character entity equivalent to '0'. This rule 
does not apply when the pad character is any other character nor when a 
pad byte is specified. 

That means that "000,000,123" would end up as "0,000,123" instead of 
",000,123" today and "0000.025" would end up as "0.025" instead of ".025" 
today but I think that is good. 

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   Andrew Edwards/UK/IBM at IBMGB
To:     DFDL-WG <dfdl-wg at ogf.org>
Date:   30/10/2014 16:43
Subject:        [DFDL-WG] Trimming of a text number that's all zeros when 
the number pattern has a sign char at the end
Sent by:        dfdl-wg-bounces at ogf.org



Hi all, 

I've hit an interesting case revolving around trimming and number patterns 
that doesn't seem quite sane to me. 

Consider an element with the following properties: 
textTrimKind='padChar' 
textNumberPadCharacter='0' 
textNumberPattern='0000+;0000-' 

So we have the sign character at the end of the representation.  Now, 
imagine that the data being parsed is "0000+".  The relevant rules from 
the DFDL specification are: 
Section 13.2 on textTrimKind
When 'padChar', the element is trimmed of the dfdl:textStringPadCharacter, 
dfdl:textNumberPadCharacter, dfdl:textBooleanPadCharacter or 
dfdl:textCalendarPadCharacter  depending on the type of the element. 
Section 13.6 on textNumberPadCharacter
When parsing, if the pad character is '0' and the SimpleContent region 
consists entirely of '0' characters, then the last remaining '0' is not 
trimmed and a single '0' is the result of the trimming.  This rule also 
applies when the pad character is a DFDL character entity equivalent to 
'0'. This rule does not apply when the pad character is any other 
character nor when a pad byte is specified. 
Section 13.6.1
Describes all of the pattern syntax. 


In our hypothetical case, the content region is not all zeros, as it ends 
in '+'.  This means that the rule in section 13.6 does not apply and we 
only apply the rule in 13.2.  This results in us trimming away all of the 
zeros and ending up with '+'.  This then doesn't parse as a number. 

The problem seems to be that the rule in Section 13.6 doesn't take into 
account that the suffix of the pattern can result in text in the content 
region that isn't part of the digits of the number.  Should the rule under 
section 13.56 be something more like this... 
When parsing, if the pad character is '0' and the SimpleContent region 
consists entirely of '0' characters, or the SimleContent region consists 
of a string of '0' characters followed by non-digit characters, then the 
last remaining '0' is not trimmed and a single '0' is the result of the 
trimming.  This rule also applies when the pad character is a DFDL 
character entity equivalent to '0'. This rule does not apply when the pad 
character is any other character nor when a pad byte is specified. 

Thoughts? 

Andy 
Andy Edwards - IBM Integration Bus - DFDL 


Email: 
andy.edwards at uk.ibm.com 
Snail Mail:   
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN 
Tel int: 
247222 
Tel ext: 
+44 (0)1962 817222 
Desk: 
DE3 V17

The Feynman problem solving Algorithm
 1) Write down the problem
 2) Think real hard
 3) Write down the answer
-- Murray Gell-mann in the NY Times


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20141103/719d3296/attachment.html>


More information about the dfdl-wg mailing list