[DFDL-WG] Fw: One email or a flock or... - re: 10.03 draft - open review items - update

Suman Kalia kalia at ca.ibm.com
Wed Oct 31 15:25:33 EDT 2012


confirmed..  symbols P & E in PIC clause cannot be used together..

Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com

For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html


----- Forwarded by Suman Kalia/Toronto/IBM on 10/31/2012 03:24 PM -----

From:   Mahendra S Akkina/Silicon Valley/IBM at IBMUS
To:     Suman Kalia/Toronto/IBM at IBMCA, 
Date:   10/31/2012 02:47 PM
Subject:        Re: Fw: [DFDL-WG] One email or a flock or... - re: 10.03 
draft - open review items - update


Hi Suman,

You are correct. They can't be used together. 

Thanks

Have a nice day.

Mahendra S Akkina
Developer, Rational Developer for System z
Enterprise Language Importers, Rational Application Developer
Ph: (408) - 506 - 5986
Email: akkina at us.ibm.com

"The more the opposition there is the better. Does a river acquire 
velocity unless there is resistance? the newer and better a thing is the 
more opposition it will meet with the onset. It is the opposition which 
foretells success" - Swami Vivekananda



From:   Suman Kalia/Toronto/IBM at IBMCA
To:     Mahendra S Akkina/Silicon Valley/IBM at IBMUS, 
Date:   10/31/2012 11:40 AM
Subject:        Fw: [DFDL-WG] One email or a flock or... - re: 10.03 draft 
- open review items - update


Mahi- Can you please confirm that symbols P & E cannot be used together in 
 a picture clause..

Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com

For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html


----- Forwarded by Suman Kalia/Toronto/IBM on 10/31/2012 02:39 PM -----

From:   Suman Kalia/Toronto/IBM
To:     Steve Hanson/UK/IBM at IBMGB, 
Cc:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, Andrew 
Edwards/UK/IBM at IBMGB, dfdl-wg at ogf.org
Date:   10/31/2012 02:39 PM
Subject:        Re: [DFDL-WG] One email or a flock or... - re: 10.03 draft 
- open review items - update


Yes - based on my understanding and reading of spec,  symbol P & E cannot 
be used together in a COBOL PIC clause..  However I will double check with 
language folks and will send reply..

Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com

For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html






From:   Steve Hanson/UK/IBM at IBMGB
To:     Suman Kalia/Toronto/IBM at IBMCA, 
Cc:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, Andrew 
Edwards/UK/IBM at IBMGB, dfdl-wg at ogf.org
Date:   10/31/2012 12:59 PM
Subject:        Re: [DFDL-WG] One email or a flock or... - re: 10.03 draft 
- open review items - update


Thanks Suman. Specifically, please can you confirm that the P symbol and 
the E symbol may not be used together in a COBOL PIC clause.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   Suman Kalia/Toronto/IBM at IBMCA
To:     Steve Hanson/UK/IBM at IBMGB, 
Cc:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, Andrew 
Edwards/UK/IBM at IBMGB, dfdl-wg at ogf.org
Date:   31/10/2012 15:29
Subject:        Re: [DFDL-WG] One email or a flock or... - re: 10.03 draft 
- open review items - update


We tried to model  the  representation of the picture  clause in memory.. 
The best way to capture was through the string.  For the following example 
 (EBCDIC ) , this is how the memory is laid out as per IBM compiler 
documentation.. 


External 
floating 
point
PIC +9(2).9(2)E+99 DISPLAY
+ 1234
4E F1 F2 4B F3 
   F4 C5 4E F0 F2

SKK -->
+12.34E+02
- 1234
60 F1 F2 4B F3 
   F4 C5 4E F0 F2

SKK->
-12.34E+02


One can make an argument, the logical type for external float is could 
xsd:float/xsd:double with textPattern associated with it but this is not 
how COBOL compiler lays out this type in memory. 



 
P symbol
Because the scaling position character P implies an assumed decimal point 
(to the left of the Ps if the Ps are leftmost PICTURE characters; to the 
right of the Ps if the Ps are rightmost PICTURE characters), the assumed 
decimal point symbol V is redundant as either the leftmost or rightmost 
character within such a PICTURE description.
Based on my testing,   The P symbol is allowed only numeric types and 
comp-3  i.e. on zoned decimal and packed decimal.. 

       01  a.
           05 b pic  9(5).
           05  Compute-Result     Pic -9v9(9)E-99.
           05  c  pic PPP999.
           05  d  pic 99PP comp-3. 



Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com

For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html






From:   Steve Hanson/UK/IBM at IBMGB
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, 
Cc:     Andrew Edwards/UK/IBM at IBMGB, dfdl-wg at ogf.org, Suman 
Kalia/Toronto/IBM at IBMCA
Date:   10/23/2012 07:04 AM
Subject:        Re: [DFDL-WG] One email or a flock or... - re: 10.03 draft 
- open review items - update


I've updated my comments based on yesterday's call and Mike's mail.

Suman - please can you help with item 4 below ?

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB, 
Cc:     dfdl-wg at ogf.org, Andrew Edwards/UK/IBM at IBMGB
Date:   22/10/2012 17:29
Subject:        Re: [DFDL-WG] One email or a flock or... - re: 10.03 draft 
- open review items - update



Suggest wording for issue 1 below. Change in italics:

When the separator and terminator on a group have the same value, then at 
a point where either separator or terminator could be found, the separator 
is tried first. (Speculative execution may try the terminator 
subsequently.)

 

On Wed, Oct 17, 2012 at 6:56 AM, Steve Hanson <smh at uk.ibm.com> wrote:
I have added some comments in-line to reflect the WG call on Tuesday, we 
will continue on Friday. 

(Andy - please see 5 below) 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Steve Hanson/UK/IBM at IBMGB 
Date:        02/10/2012 18:41 
Subject:        One email or a flock or... - re: 10.03 draft - open review 
items 



Steve,

I've got the issues below left after your review pass on 10.03, minus 2 I 
send emails to you about separately.

Should I issue this email to the WG, or do you want me to decompose this 
into separate emails, or do you just want to list these as agenda topics 
for next call? I think it is good if people get to look at them in advance 
of a call. 

...mikeb

--------------------------------------------------------

This is a list of items left open after a review pass by SMH(on draft in 
preparation r010.03).

These items need specific WG discussion on a call. They may be small 
enough to resolve there, or may be escalated into action items. (A couple 
issues already clearly action-item related are not listed here.)

Note: please Ignore the identifiers like SMH107 or m236 I'm tagging these 
with. Those are just for me editing the text. (Those change ...grrr... if 
someone inserts a comment into the document, so they're not good issue 
identifiers).

1.        SMH107   Spec says: When the separator and terminator on a group 
have the same value, then at a point where either separator or terminator 
could be found, the separator is tried first. 
Issue is that this language still feels ambiguous. E.g., So it tries the 
separator first, let’s say it finds it. Will a subsequent processing error 
cause it to backtrack and revisit this and try the terminator? Or does 
finding the separator confirm that it IS a separator, resolve forever that 
point of uncertainty? I believe the latter is what was intended (delimiter 
decisions drive parsing and are not revisited), but we need to state this 
(or do we somewhere else already?)
SMH: Mike will add the following after the sentence in question "
(Speculative execution may try the terminator subsequently.)".  Agreed 
that encountering a separator does not resolve a point of uncertainty.

2.        SMH169 - Some numeric types are signed, others unsigned. Some 
representations are sign-capable, some are not (BCD specifically). Right 
now spec draft says you can't have bcd as rep for signed integer types 
long, int, short, byte. But you CAN have bcd for rep of decimal, integer. 
We could allow bcd only for nonNegativeInteger type, but there is no 
nonNegativeDecimal type, so....how to resolve? I would suggest that we 
simply allow bcd as rep for both signed and unsigned types, and it's a 
processing error to unparse a negative value into bcd rep. 
SMH: Noted that for a decimal, property decimalSigned is used to indicate 
whether the logical value is signed or not. So we could disallow BCD for 
integer and for decimal when decimalSigned is 'yes'.  Interestingly, 
section 3.7.1 states "Signed numbers with dfdl:binaryNumberRep 'bcd' are 
always positive. On unparsing it is a processing error if the data is 
negative." which is admitting that BCD can be used with signed types. IBM 
DFDL currently implements the table in the description of binaryNumberRep, 
and so allows BCD for integer and decimal regardless of decimalSigned, but 
does not allow long, short, int, byte.  Agreed to leave rules as they are 
today.

3.        m229 - textStandardZeroRep - should this allow %ES; as one of 
the list of possibles? 
SMH: Decided not to allow %ES; because it adds some complexity to the 
'empty representation' processing rules, in the same way that xs:string 
and xs:hexBinary do. Can always make an element required and use default 
of 0. 

4.        m236 - is V (virtual decimal point position) and also P allowed 
in the textNumberPattern for double and float types? 
SMH: Post-call investigation: Errata 2.80 says they are allowed, but not 
in conjunction with E, @ and * symbols. This is reflected in BNF as 
subpattern := prefix? ((number exponent?) | vpinteger) suffix?.  This may 
be too restrictive. COBOL supports 'external floating point' numbers which 
contain +/-, E and . characters, and so are DFDL text standard floats. See 
http://publib.boulder.ibm.com/infocenter/comphelp/v7v91/index.jsp?topic=%2Fcom.ibm.aix.cbl.doc%2Fcpari09.htm 
and its child topic for some examples of these floats.  Note that one 
example shows use of V symbol. Also experimentation with the IBM DFDL 
COBOL importer shows that both V & P symbols in the PIC clause are 
allowed. This suggest that the BNF should be revised. It be should be 
noted though that IBM DFDL COBOL imports these floats as xs:string and 
makes no attempt to convert to a number. Need to ask Suman why this is.

5.        m237 - Do we check that the various symbols used for infinity, 
digits, grouping separators, decimal separators are properly distinct to 
allow parsing? E.g., that the decimal separator and grouping separator 
aren't the same, and that the positive and negative pattern variants are 
distinguishable? ICU library supposedly doesn't do this checking. Do we 
state this is an SDE in DFDL. If so then is this checking required? Can we 
make it possible for implementations to not check somehow? Other grammar 
ambiguity situations like separator and terminator being ambiguous are 
specifically NOT checked for, because determining if a grammar is 
ambiguous is hard or undecidable, and would have to be done at runtime 
because delimiters can be run-time computed. Buf for the syntax components 
of text numbers do we require checking or not? 
SMH: Post-call investigation: IBM DFDL gives an error if the decimal & 
grouping separators are the same, but does not check any of the other 
characters for uniqueness.  (Andy - please can you check what ICU does if 
you set various of the text number characters to be the same value, eg, 
decimal sep, grouping sep, exponent and (for floats) Nan and Inf reps ?, 
in both strict & lax modes) 
ICU does not appear to check when setting the values for symbols if they 
overlap. It has precedence rules when matching (eg, match NaN before Inf, 
match decimal separator before grouping separator). It gives parse errors 
if the data does not work with its rules. To avoid having to understand 
these rules, it was agreed that decimal separator, group separator, 
exponent rep, Inf, NaN and zero rep must all be distinct, schema 
definition error otherwise. Noted that if decimal separator, group 
separator and exponent rep are expressions, this checking must be deferred 
until parsing/unparsing.

6.        m370 - multiple PoU resolutions: If you have initiatedContent, 
AND a choiceBranchRef, AND a discriminator all on the same element, and 
there are 3 enclosing nested PoU, which one controls which? Precedence is 
the issue. Or..... do we really need to allow this? Why don’t we just 
disallow this kind of piling-on of complexity and make the user choose 
which PoU resolution technique they want? 
SMH: a) initiatedContent 'yes' on a choice/sequence, and a discriminator 
on a child of the choice/sequence. Allowed at the moment, and it is quite 
possible that users of IBM DFDL have this combination. Agreed not to make 
this an SDE (it's a difficult check to get right anyway due to the nature 
of discriminator placement), nor to issue a warning either. 
b) initiatedContent 'yes' and choiceBranchRef together on a choice. Yes 
one is redundant. Agreed to make the combination an SDE. Noted that might 
force users to explictly set initiatedContent to 'no' if initiatedContent 
'yes' in scope.   
c) choiceBranchRef on choice and a discriminator on a child of the choice. 
Agreed not to make this an SDE (same reasoning as a) above), nor to issue 
a warning either. 

7.        m396 - is BCD representation a mandatory feature, or optional? 
SMH: BCD calendars and BCD numbers are independent optional features. Will 
update errata document to make this clear. 

8.        m398 - portability at risk if subset processors ignore 
properties they don't implement. We relaxed this from a more rigid policy, 
and now allow subsets to not validate properties they don't implement. 
However, is there a better compromise, e.g., require a warning about all 
unimplemented/unrecognized properties? E.g., dfdl:textBiDi='no' yields SDE 
"unrecognized property 'textBiDi' with value 'no'. 
SMH: Agreed that this had been relaxed too much and a warning MUST be 
issued by implementations. 

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg



-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121031/98d52232/attachment-0001.html>


More information about the dfdl-wg mailing list