[DFDL-WG] DFDL binaryCalendarRep pattern limitations

Steve Hanson smh at uk.ibm.com
Mon Oct 29 05:53:36 EDT 2012


DFDL WG agreed that the below behaviour is correct.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Steve Hanson/UK/IBM
To:     dfdl-wg at ogf.org, 
Cc:     Andrew Edwards/UK/IBM at IBMGB
Date:   23/10/2012 10:57
Subject:        DFDL binaryCalendarRep pattern limitations


For discussion at next DFDL WG call. See summary below but key points are:

- DFDL uses a calendar pattern to convert binary calendar 'packed', 'bcd' 
and 'ibm4690Packed' reps to a schema calendar type
- No trimming/padding of binary reps takes place, the parser uses what was 
extracted from the data
- A 'packed' rep will always present an odd number of digits (because of 
sign nibble)
- A 'bcd' rep will always present an even number of digits
- ICU gives an error if the number of digits presented to it exceeds the 
length of the calendar pattern 
- Therefore the onus is on the user to ensure that the calendar pattern 
matches the number of digits, eg, by adding leading zeros to the pattern 

Are we happy with that behaviour?

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 23/10/2012 10:49 -----

From:   Andrew Edwards/UK/IBM
To:     Steve Hanson/UK/IBM at IBMGB, 
Date:   22/10/2012 15:36
Subject:        DFDL binaryCalendarRep pattern limitations


Hi Steve - a summary of the problem is below

While adding support for IBM4690 packed representation for binaryNumberRep 
and binaryCalendarRep, it has become apparent that we may need to place a 
restriction on the calendarPattern property, depending on the choice of 
binaryCalendarRep.  A problem surfaces in being able to reliably and 
reversibly distinguish a calendar value when the pattern length is 
incompatible to the packing type.  If we use 'ibm4690packed' and a pattern 
that is of odd length, we end up matching an even-length string against an 
odd-length pattern and there isn't necessarily a well-defined defined 
answer.

This is best understood with an example:
 - Consider a pattern with an odd number of characters, such as 
calendarPattern=yyyyMMddDDD
 - A value will require a bytestream of length 6 bytes, which would be 
serialised as 0x0{y}{y}{y}{y}{M}{M}{d}{d}{D}{D}{D}
 - For example, 2012-10-22 (day295) would be represented as 
0x020121022295.

If a parser is represented with this value and pattern, then it will try 
to match the string "020121022295" against "yyyyMMddDDD".  ICU returns an 
error because the string is longer than the pattern and I can't say I 
blame it.  Should it ignore the zero at the start, or the 5 at the end? 
Without understanding the pattern, a DFDL parser cannot know.  ICU can't 
resolve the value as it has more than one group in the pattern so it can't 
resolve to one single solution.  This becomes more problematic when we 
consider behaviour for a pattern of "DDD" and of "SSS" as they expect 
padding at different ends by default (a value of "0100" resolves as 
"DDD"=100days and "SSS"=0.010 seconds).

For packed representation, the opposite problem occurs: There is always a 
sign nibble in the packed form, so we will always have a value made up of 
an odd number of digits.  This can't match against all patterns of even 
length.

The solution that we discussed was to require the calendar pattern to have 
a certain digit count depending on the choice of binaryCalendarRep, and 
allow the pattern to include number characters as a form of "padding".  So 
for the pattern in the example above, this would have to be changed so 
that calendarPattern='0'yyyyMMddDDD to force it to have an even digit 
count.

If that fully explains the problem, do you want to take it to the DFDL 
workgroup and check what the consensus opinion is?

Cheers,
Andy 
Andy Edwards - WebSphere Message Broker - DFDL


Email:
andy.edwards at uk.ibm.com
Snail Mail: 
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int:
247222
Tel ext:
+44 (0)1962 817222
Desk:
DE2 U20

The Feynman problem solving Algorithm
  1) Write down the problem
  2) Think real hard
  3) Write down the answer
 -- Murray Gell-mann in the NY Times

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121029/bc36b278/attachment.html>


More information about the dfdl-wg mailing list