[DFDL-WG] DFDL binaryCalendarRep pattern limitations
Steve Hanson
smh at uk.ibm.com
Mon Oct 29 05:53:36 EDT 2012
DFDL WG agreed that the below behaviour is correct.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org,
Cc: Andrew Edwards/UK/IBM at IBMGB
Date: 23/10/2012 10:57
Subject: DFDL binaryCalendarRep pattern limitations
For discussion at next DFDL WG call. See summary below but key points are:
- DFDL uses a calendar pattern to convert binary calendar 'packed', 'bcd'
and 'ibm4690Packed' reps to a schema calendar type
- No trimming/padding of binary reps takes place, the parser uses what was
extracted from the data
- A 'packed' rep will always present an odd number of digits (because of
sign nibble)
- A 'bcd' rep will always present an even number of digits
- ICU gives an error if the number of digits presented to it exceeds the
length of the calendar pattern
- Therefore the onus is on the user to ensure that the calendar pattern
matches the number of digits, eg, by adding leading zeros to the pattern
Are we happy with that behaviour?
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 23/10/2012 10:49 -----
From: Andrew Edwards/UK/IBM
To: Steve Hanson/UK/IBM at IBMGB,
Date: 22/10/2012 15:36
Subject: DFDL binaryCalendarRep pattern limitations
Hi Steve - a summary of the problem is below
While adding support for IBM4690 packed representation for binaryNumberRep
and binaryCalendarRep, it has become apparent that we may need to place a
restriction on the calendarPattern property, depending on the choice of
binaryCalendarRep. A problem surfaces in being able to reliably and
reversibly distinguish a calendar value when the pattern length is
incompatible to the packing type. If we use 'ibm4690packed' and a pattern
that is of odd length, we end up matching an even-length string against an
odd-length pattern and there isn't necessarily a well-defined defined
answer.
This is best understood with an example:
- Consider a pattern with an odd number of characters, such as
calendarPattern=yyyyMMddDDD
- A value will require a bytestream of length 6 bytes, which would be
serialised as 0x0{y}{y}{y}{y}{M}{M}{d}{d}{D}{D}{D}
- For example, 2012-10-22 (day295) would be represented as
0x020121022295.
If a parser is represented with this value and pattern, then it will try
to match the string "020121022295" against "yyyyMMddDDD". ICU returns an
error because the string is longer than the pattern and I can't say I
blame it. Should it ignore the zero at the start, or the 5 at the end?
Without understanding the pattern, a DFDL parser cannot know. ICU can't
resolve the value as it has more than one group in the pattern so it can't
resolve to one single solution. This becomes more problematic when we
consider behaviour for a pattern of "DDD" and of "SSS" as they expect
padding at different ends by default (a value of "0100" resolves as
"DDD"=100days and "SSS"=0.010 seconds).
For packed representation, the opposite problem occurs: There is always a
sign nibble in the packed form, so we will always have a value made up of
an odd number of digits. This can't match against all patterns of even
length.
The solution that we discussed was to require the calendar pattern to have
a certain digit count depending on the choice of binaryCalendarRep, and
allow the pattern to include number characters as a form of "padding". So
for the pattern in the example above, this would have to be changed so
that calendarPattern='0'yyyyMMddDDD to force it to have an even digit
count.
If that fully explains the problem, do you want to take it to the DFDL
workgroup and check what the consensus opinion is?
Cheers,
Andy
Andy Edwards - WebSphere Message Broker - DFDL
Email:
andy.edwards at uk.ibm.com
Snail Mail:
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE2 U20
The Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121029/bc36b278/attachment.html>
More information about the dfdl-wg
mailing list