[DFDL-WG] Action 284 - agenda item on ICU 'S' symbol
Steve Hanson
smh at uk.ibm.com
Tue Aug 25 12:56:09 EDT 2015
Closed. https://redmine.ogf.org/issues/297. Agreed that overall number of
S symbols is implementation-defined.
Paragraph revised to:
"The maximum number of "S" symbols that may appear in the pattern is
implementation-defined, but must be at least three. The stored accuracy
for fractional seconds is also implementation-defined, but must be at
least millisecond accuracy. When the number of "S" symbols in a pattern
exceeds the supported accuracy, excess fractional seconds are truncated
from the right (not rounded) when parsing, and zeros are added to the
right when unparsing. For example, a DFDL processor allows up to six "S"
symbols and has millisecond accuracy; for pattern "ss.SSSSSS", data
"12.345678" would be parsed into infoset xs:time "00:00:12:345", which
would be unparsed into data "12.345000".
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 25/08/2015 14:24
Subject: Re: [DFDL-WG] Action 284 - agenda item on ICU 'S' symbol
When testing I found that the data was corrupted when I got to > 9 'S'
symbols, due to ICU's use of int32 to store the value. I raised a ticket
and it has been accepted as a defect. But it shows that normal use does
not go beyond 9.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson/UK/IBM at IBMGB
Cc: Andrew Edwards/UK/IBM at IBMGB, DFDL-WG <dfdl-wg at ogf.org>
Date: 25/08/2015 14:16
Subject: Re: [DFDL-WG] Action 284 - agenda item on ICU 'S' symbol
Do we really want to allow "any number of S" ?
Quantum mechanics based on plank's constant and the speed of light, the
smallest unit of time is about 3.3x10-44 seconds, so there's never going
to be a need for more than 45 S's in this universe, at least until
time-travel is discovered. (
http://www.physlink.com/Education/AskExperts/ae598.cfm)
This is a place where an "implementation specific maximum" makes sense to
me, though I'd be happy to put a floor under it like not less than 6 "S".
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Tue, Aug 25, 2015 at 6:54 AM, Steve Hanson <smh at uk.ibm.com> wrote:
Section 13.11.1 to be updated as follows:
S
fractional second (see note 1)
Number
S
SS
SSS
2
23
235
The count of pattern letters determines the format as indicated in the
table. <---- moved earlier
When numeric fields abut one another directly, with no intervening
delimiter characters, they constitute a run of abutting numeric fields.
Such runs are parsed specially as described at [ICUDateTime].
Unlike other fields, fractional seconds "S" are padded on the right with
zero. <--- moved earlier
Any number of "S" symbols may by specified in the pattern. Implementations
must accept any number of "S" symbols and must support at least
millisecond accuracy. When the number of "S" symbols exceeds the supported
accuracy, excess fractional seconds are truncated from the right (not
rounded) when parsing, and zeros are added to the right when unparsing.
For example, for xs:time with dfdl:calendarPattern "ss.SSSS" and
millisecond accuracy, parsing data "12.3456" creates infoset value
"00:00:12:345", which when unparsing creates data "12.3450".
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: Andrew Edwards/UK/IBM at IBMGB
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 12/08/2015 09:06
Subject: Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda
item on ICU 'S' symbol
Andy - yes that is the behaviour I am seeing.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Andrew Edwards/UK/IBM
To: Steve Hanson/UK/IBM at IBMGB
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 11/08/2015 17:28
Subject: Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda
item on ICU 'S' symbol
Following on from today's call, the relevant piece of documentation is in
http://icu-project.org/apiref/icu4c/classicu_1_1SimpleDateFormat.html
When numeric fields abut one another directly, with no intervening
delimiter characters, they constitute a run of abutting numeric fields.
Such runs are parsed specially. For example, the format "HHmmss" parses
the input text "123456" to 12:34:56, parses the input text "12345" to
1:23:45, and fails to parse "1234". In other words, the leftmost field of
the run is flexible, while the others keep a fixed width. If the parse
fails anywhere in the run, then the leftmost field is shortened by one
character, and the entire run is parsed again. This is repeated until
either the parse succeeds or the leftmost field is one character in
length. If the parse still fails at that point, the parse of the run
fails.
So it seems that when the 'S' is next to other numeric units in the
pattern, it will be subject to the above behaviour. Therefore:
- A pattern of HHmmssSSS with the input 112233123 will become
11:22:33.123 but the input 1122331234 will trigger an error.
- If the pattern includes a '.' to become HHmmss.SSS, I think the input
1122331234 will become 11:22:33.123 but I'll try and confirm.
Steve - Does that description match what you were seeing?
HTH,
Andy
Andy Edwards - IBM Integration Bus - DFDL
Email:
andy.edwards at uk.ibm.com
Snail Mail:
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE3 V17
The Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times
From: Steve Hanson/UK/IBM
To: Andrew Edwards/UK/IBM at IBMGB
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 11/08/2015 12:53
Subject: Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda
item on ICU 'S' symbol
Hi Andy
Your internal ticket #630 gave rise to external ticket
http://bugs.icu-project.org/trac/ticket/10962, which claims to have fixed
the API docs to clarify the behaviour.
S
fractional second - truncates (like other time fields)
to the count of letters when formatting. Appends
zeros if more than 3 letters specified. Truncates at
three significant digits when parsing.
S
SS
SSS
SSSS
2
23
235
2350
I can't see anywhere that addresses your point about about abutting versus
non-abutting numeric symbols though?
As far as DFDL spec is concerned, this is what we say today:
S
fractional second (see note 1)
Number
S
SS
SSS
2
24
235
There is no 'note 1', I think the note was made into a normal paragraph,
which reads:
Any number of fractional seconds "S" may by specified in the pattern and
accepted by implementations, but an implementation is free to represent a
limited number of fractional seconds internally. Excess fractional seconds
are truncated, not rounded up. At least millisecond accuracy must be
implemented. Unlike other fields, fractional seconds are padded on the
right with zero.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Andrew Edwards/UK/IBM
To: Steve Hanson/UK/IBM at IBMGB
Date: 11/08/2015 11:56
Subject: Re: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11
Hi Steve
Re agenda item 2 and calendar patterns with 'S', this ICU ticket from last
year might be relevant - https://icu.sanjose.ibm.com/gcoctrac/ticket/630.
It seems that the error reporting may also depend on whether the pattern
has 'S' on it's own or next to other numeric pattern entities. i.e.
'HHmmssS' is subject to length checking, but 'HHmmss S' is not, due to the
space before the 'S'.
HTH,
Andy
Andy Edwards - IBM Integration Bus - DFDL
Email:
andy.edwards at uk.ibm.com
Snail Mail:
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE3 V17
The Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times
From: Steve Hanson/UK/IBM at IBMGB
To: dfdl-wg at ogf.org
Cc: Mike Beckerle <mbeckerle at tresys.com>, jorge.marizan at gmail.com
Date: 10/08/2015 18:29
Subject: [DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11
Sent by: dfdl-wg-bounces at ogf.org
Please find agenda for call on Redmine at
https://redmine.ogf.org/dmsf_files/13489?download=
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150825/8bdf735e/attachment-0001.html>
More information about the dfdl-wg
mailing list