[DFDL-WG] Action 284 - agenda item on ICU 'S' symbol

Steve Hanson smh at uk.ibm.com
Tue Aug 25 12:56:09 EDT 2015


Closed.  https://redmine.ogf.org/issues/297. Agreed that overall number of 
S symbols is implementation-defined.

Paragraph revised to:

"The maximum number of "S" symbols that may appear in the pattern is 
implementation-defined, but must be at least three. The stored accuracy 
for fractional seconds is also implementation-defined, but must be at 
least millisecond accuracy. When the number of "S" symbols in a pattern 
exceeds the supported accuracy, excess fractional seconds are truncated 
from the right (not rounded) when parsing, and zeros are added to the 
right when unparsing. For example, a DFDL processor allows up to six "S" 
symbols and has millisecond accuracy; for pattern "ss.SSSSSS", data 
"12.345678" would be parsed into infoset xs:time "00:00:12:345", which 
would be unparsed into data "12.345000". 

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:     DFDL-WG <dfdl-wg at ogf.org>
Date:   25/08/2015 14:24
Subject:        Re: [DFDL-WG] Action 284 - agenda item on ICU 'S' symbol


When testing I found that the data was corrupted when I got to > 9 'S' 
symbols, due to ICU's use of int32 to store the value. I raised a ticket 
and it has been accepted as a defect.  But it shows that normal use does 
not go beyond 9.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB
Cc:     Andrew Edwards/UK/IBM at IBMGB, DFDL-WG <dfdl-wg at ogf.org>
Date:   25/08/2015 14:16
Subject:        Re: [DFDL-WG] Action 284 - agenda item on ICU 'S' symbol



Do we really want to allow "any number of S" ?

Quantum mechanics based on plank's constant and the speed of light, the 
smallest unit of time is about 3.3x10-44 seconds, so there's never going 
to be a need for more than 45 S's in this universe, at least until 
time-travel is discovered. (
http://www.physlink.com/Education/AskExperts/ae598.cfm)

This is a place where an "implementation specific maximum" makes sense to 
me, though I'd be happy to put a floor under it like not less than 6 "S".


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Tue, Aug 25, 2015 at 6:54 AM, Steve Hanson <smh at uk.ibm.com> wrote:
Section 13.11.1 to be updated as follows: 

S         
fractional second (see note 1) 
Number         
S 
SS 
SSS 
2 
23 
235


The count of pattern letters determines the format as indicated in the 
table.   <---- moved earlier 

When numeric fields abut one another directly, with no intervening 
delimiter characters, they constitute a run of abutting numeric fields. 
Such runs are parsed specially as described at [ICUDateTime]. 

Unlike other fields, fractional seconds "S" are padded on the right with 
zero.  <--- moved earlier 

Any number of "S" symbols may by specified in the pattern. Implementations 
must accept any number of "S" symbols and must support at least 
millisecond accuracy. When the number of "S" symbols exceeds the supported 
accuracy, excess fractional seconds are truncated from the right (not 
rounded) when parsing, and zeros are added to the right when unparsing. 
For example, for xs:time with dfdl:calendarPattern "ss.SSSS" and 
millisecond accuracy, parsing data "12.3456" creates infoset value 
"00:00:12:345", which when unparsing creates data "12.3450". 

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Steve Hanson/UK/IBM 
To:        Andrew Edwards/UK/IBM at IBMGB 
Cc:        DFDL-WG <dfdl-wg at ogf.org> 
Date:        12/08/2015 09:06 
Subject:        Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda 
item on ICU 'S' symbol 


Andy - yes that is the behaviour I am seeing.  

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 




From:        Andrew Edwards/UK/IBM 
To:        Steve Hanson/UK/IBM at IBMGB 
Cc:        DFDL-WG <dfdl-wg at ogf.org> 
Date:        11/08/2015 17:28 
Subject:        Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda 
item on ICU 'S' symbol 


Following on from today's call, the relevant piece of documentation is in 
http://icu-project.org/apiref/icu4c/classicu_1_1SimpleDateFormat.html 

When numeric fields abut one another directly, with no intervening 
delimiter characters, they constitute a run of abutting numeric fields. 
Such runs are parsed specially. For example, the format "HHmmss" parses 
the input text "123456" to 12:34:56, parses the input text "12345" to 
1:23:45, and fails to parse "1234". In other words, the leftmost field of 
the run is flexible, while the others keep a fixed width. If the parse 
fails anywhere in the run, then the leftmost field is shortened by one 
character, and the entire run is parsed again. This is repeated until 
either the parse succeeds or the leftmost field is one character in 
length. If the parse still fails at that point, the parse of the run 
fails. 

So it seems that when the 'S' is next to other numeric units in the 
pattern, it will be subject to the above behaviour.  Therefore: 
 - A pattern of HHmmssSSS with the input 112233123 will become 
11:22:33.123 but the input 1122331234 will trigger an error. 
 - If the pattern includes a '.' to become HHmmss.SSS, I think the input 
1122331234 will become 11:22:33.123 but I'll try and confirm. 

Steve - Does that description match what you were seeing? 

HTH, 
Andy 
Andy Edwards - IBM Integration Bus - DFDL 

Email: 
andy.edwards at uk.ibm.com 
Snail Mail:   
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN 
Tel int: 
247222 
Tel ext: 
+44 (0)1962 817222 
Desk: 
DE3 V17

The Feynman problem solving Algorithm
 1) Write down the problem
 2) Think real hard
 3) Write down the answer
-- Murray Gell-mann in the NY Times






From:        Steve Hanson/UK/IBM 
To:        Andrew Edwards/UK/IBM at IBMGB 
Cc:        DFDL-WG <dfdl-wg at ogf.org> 
Date:        11/08/2015 12:53 
Subject:        Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda 
item on ICU 'S' symbol 


Hi Andy 

Your internal ticket #630 gave rise to external ticket 
http://bugs.icu-project.org/trac/ticket/10962, which claims to have fixed 
the API docs to clarify the behaviour.
S 
fractional second - truncates (like other time fields) 
to the count of letters when formatting. Appends 
zeros if more than 3 letters specified. Truncates at 
three significant digits when parsing.  
S
SS
SSS
SSSS 
2
23
235
2350


I can't see anywhere that addresses your point about about abutting versus 
non-abutting numeric symbols though? 

As far as DFDL spec is concerned, this is what we say today: 

S         
fractional second (see note 1) 
Number         
S 
SS 
SSS 
2 
24 
235


There is no 'note 1', I think the note was made into a normal paragraph, 
which reads: 

Any number of fractional seconds "S" may by specified in the pattern and 
accepted by implementations, but an implementation is free to represent a 
limited number of fractional seconds internally. Excess fractional seconds 
are truncated, not rounded up. At least millisecond accuracy must be 
implemented. Unlike other fields, fractional seconds are padded on the 
right with zero. 

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 




From:        Andrew Edwards/UK/IBM 
To:        Steve Hanson/UK/IBM at IBMGB 
Date:        11/08/2015 11:56 
Subject:        Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 


Hi Steve 

Re agenda item 2 and calendar patterns with 'S', this ICU ticket from last 
year might be relevant - https://icu.sanjose.ibm.com/gcoctrac/ticket/630. 
 It seems that the error reporting may also depend on whether the pattern 
has 'S' on it's own or next to other numeric pattern entities.  i.e. 
'HHmmssS' is subject to length checking, but 'HHmmss S' is not, due to the 
space before the 'S'. 

HTH, 
Andy 
Andy Edwards - IBM Integration Bus - DFDL 

Email: 
andy.edwards at uk.ibm.com 
Snail Mail:   
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN 
Tel int: 
247222 
Tel ext: 
+44 (0)1962 817222 
Desk: 
DE3 V17

The Feynman problem solving Algorithm
 1) Write down the problem
 2) Think real hard
 3) Write down the answer
-- Murray Gell-mann in the NY Times






From:        Steve Hanson/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org 
Cc:        Mike Beckerle <mbeckerle at tresys.com>, jorge.marizan at gmail.com 
Date:        10/08/2015 18:29 
Subject:        [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 
Sent by:        dfdl-wg-bounces at ogf.org 



Please find agenda for call on Redmine at 
https://redmine.ogf.org/dmsf_files/13489?download= 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150825/8bdf735e/attachment-0001.html>


More information about the dfdl-wg mailing list