[DFDL-WG] Action Item 049: Built-in specification description and schemas

Suman Kalia kalia at ca.ibm.com
Tue Feb 2 10:06:54 CST 2010


Thanks Steve for your note..  Comments below

   I agree there will be handful of escape schemes and the opportunity for
   their reuse is very high..


Looking at Calendar format -  the attribute that would vary most are
calendarPattern followed by calendarTimeZone.  calendarPatternKind goes
along with calendarPattern; it tells whether to use calendar pattern from
schema date/time type or from DFDL properties.  Rest of the attributes are
likely to be same for a particular format.

For consistency with textNumberFormat,  I am fine to add all attributes
defined in calendarFormat to dfdl:element and dfdl:simpleType..


Suman Kalia
IBM Toronto Lab
WMB Toolkit Architect and Development Lead
WebSphere Business Integration Application Connectivity Tools

http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html


Tel : 905-413-3923  T/L  969-3923
Fax : 905-413-4850 T/L  969-4850
Internet ID : kalia at ca.ibm.com


                                                                                                                                              
  From:       Steve Hanson/UK/IBM at IBMGB                                                                                                       
                                                                                                                                              
  To:         Suman Kalia/Toronto/IBM at IBMCA                                                                                                   
                                                                                                                                              
  Cc:         dfdl-wg at ogf.org                                                                                                                 
                                                                                                                                              
  Date:       02/02/2010 04:59 AM                                                                                                             
                                                                                                                                              
  Subject:    Re: Action Item 049:   Built-in specification description and schemas                                                           
                                                                                                                                              




Thanks for highlighting this Suman.

The reason for hiving off the properties for text numbers into a separate
named annotation was reuse.  It was considered that a given data format
might have a large number of text number fields, but that they could be
described by a far lesser number of annotations, because a limited set of
'number patterns' were used.  In Suman's example that's clearly not the
case, but it is an artificial one. We need to consider real world formats.
I've had a look through example COBOL copybooks, and while there is a large
variation in text number fields, reuse of 'number patterns' would be a
benefit. For example, a set of related values might be declared the same:

             15     ORIGINAL-PRICE      PIC  9(013)V99.
             15     DISCOUNTED-PRICE    PIC  9(013)V99.
             15     SALE-PRICE          PIC  9(013)V99.
             15     STAFF-PRICE         PIC  9(013)V99.
             15     TOTAL-PRICE         PIC  9(013)V99.

The question then becomes what is the best way to achieve this reuse. If
you look at a dfdl:textNumberFormat annotation, it is the number pattern
that varies. Everything else would be defined once in a dfdl:format
annotation and scoped.  So it does seem overkill to have a
dfdl:textNumberFormat for every number pattern, because the contained
properties can not be scoped and must be redeclared each time.

I suggest the best reuse mechanism for this scenario is the simple type. In
the above example I could declare a PRICE simple type and put the number
pattern on that.

I therefore agree with Suman.  Remove dfdl:textNumberFormat and
dfdl:defineTextNumberFormat, add all the properties to dfdl:element and
dfdl:simpleType. In practice most will be set in a dfdl:format and scoped,
only the number pattern will vary per element or simple type.

We should also consider whether the same issue applies to
dfdl:calendarFormat and dfdl:escapeScheme. For both these the reuse
opportunity is high. There is likely to be just one escape scheme per data
format. There is likely to be a small number of calendar formats per data
format (eg, one for a date, one for a time, one for a timestamp). But in
the latter case, it is typically just the calendarPattern that would vary,
the rest of the properties would be set once.

I suggest that whatever we adopt for text numbers we also adopt for
calendars, for consistency.

Regards

Steve Hanson
Programming Model Architect, WebSphere Message Broker,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848



                                                                                                                                              
  From:       Suman Kalia/Toronto/IBM at IBMCA                                                                                                   
                                                                                                                                              
  To:         Alan Powell/UK/IBM at IBMGB, Steve Hanson/UK/IBM at IBMGB, Mike Beckerle <mbeckerle.dfdl at gmail.com>                                   
                                                                                                                                              
  Cc:         dfdl-wg at ogf.org                                                                                                                 
                                                                                                                                              
  Date:       02/02/2010 00:21                                                                                                                
                                                                                                                                              
  Subject:    Action Item 049:   Built-in specification description and schemas                                                               
                                                                                                                                              




I am trying to create DFDL definition for COBOL copy book and have
experienced a usability issue with TextNumberFormat which have to be named
and referenced from dfdl:element and dfdl:simpleType annotations.  Consider
a sample COBOL copy book, attached below, where I have 3 elements having
PIC 9999 display clause (a.k.a zoned decimal) and 2 external (standard)
decimal. They all have same length but the main difference between them is
number is sign which could leading or trailing. As per the V.38 spec, I
would have to create a named textNumberFormat for each of the picture
clause. The key difference in the named textNumberFormats for these
definitions would be numberPattern and rest of the attributes for standard
decimal and zoned decimal are going to be same for a particular platform or
data definition format. The generated DFDL schema will be containing many
occurrences of TextNumberFormat and in the worst case scenario one for each
element defined in the COBOL copy book.  This is not very usable and also
user would have to carefully choose the name for these formats so he can
easily identify and distinguish if wants to resue them something like
TextNumberStandardLength5SignLeading   etc..

           01  CobolTypes.

            * External decimal  ( Zoned decimal)
                 05   elem9                           PIC  9999 DISPLAY.
                 05   elem9Signed                     PIC S9999 DISPLAY.
                 05   elem9SignedLeading              PIC S9999 DISPLAY
                                                      SIGN LEADING.

            * in DFDL - modeled as standard decimal
                 05   elem9SignedLeadingSeparate      PIC S9999 DISPLAY
                                                      SIGN LEADING SEPARATE
      .
                 05   elem9SignedTrailingSeparate     PIC S9999 DISPLAY
                                                     SIGN TRAILING SEPARATE
      .

            Number Format
                  When textNumberRepresentation is ?zoned? only the pattern
                  for positive numbers is used. Only the following pattern
                  characters may be used: '+' to indicate whether the
                  leading or trailing digit carries the overpunched sign,
                  'V' to indicate the location of an implied decimal point
                  and '0' to indicate the number of digits (including
                  overpunched). The number is '0' characters must match the
                  number of digits in the representation otherwise it is a
                  schema definition error.




Better approach would be

          Add numberPattern to dfdl:element and dfdl:simpleType annotation
         and rest of the attributes from TextNumberFormat block to either
         a) dfdl:format only or  (b) both dfdl:format and dfdl:element and
         dfdl:simpleType.



Let's discuss this in the DFDL workgroup call tomorrow ..



Attached below is a schema coded with the assumption (a) listed above..

<xsd:complexType name="CobolTypes">
            <xsd:sequence>
                  <!----------------  External Decimal
-------------------------------->
                  <xsd:element name="elem9" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
                        dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="0000">
                        <xsd:simpleType>
                              <xsd:restriction base="xsd:short">
                                    <xsd:minInclusive value="0" />
                                    <xsd:maxInclusive value="9999" />
                              </xsd:restriction>
                        </xsd:simpleType>
                  </xsd:element>
                  <xsd:element name="elem9Signed" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
                        dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="0000+"  >
                        <xsd:simpleType>
                              <xsd:restriction base="xsd:short">
                                    <xsd:minInclusive value="-9999" />
                                    <xsd:maxInclusive value="9999" />
                              </xsd:restriction>
                        </xsd:simpleType>
                  </xsd:element>
                  <xsd:element name="elem9SignedLeading" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
                        dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="+0000">
                        <xsd:simpleType>
                              <xsd:restriction base="xsd:short">
                                    <xsd:minInclusive value="-9999" />
                                    <xsd:maxInclusive value="9999" />
                              </xsd:restriction>
                        </xsd:simpleType>
                  </xsd:element>
                  <xsd:element name="elem9SignedLeadingSeparate" dfdl:ref=
"dfdlCobolFmt:CobolStandardDecimalFormat"
                        dfdl:length="5" dfdl:representation="text"
dfdl:numberPattern="+0000;-00000"  >
                        <xsd:simpleType>
                              <xsd:restriction base="xsd:short">
                                    <xsd:minInclusive value="-9999" />
                                    <xsd:maxInclusive value="9999" />
                              </xsd:restriction>
                        </xsd:simpleType>
                  </xsd:element>
                  <xsd:element name="elem9SignedTrailingSeparate" dfdl:ref=
"dfdlCobolFmt:CobolStandardDecimalFormat"
                        dfdl:length="5" dfdl:representation="text"
dfdl:numberPattern="0000+;00000-">
                        <xsd:simpleType>
                              <xsd:restriction base="xsd:short">
                                    <xsd:minInclusive value="-9999" />
                                    <xsd:maxInclusive value="9999" />
                              </xsd:restriction>
                        </xsd:simpleType>
                  </xsd:element>


----- Data format Definitions

                  <xsd:defineFormat name="CobolStandardDecimalFormat">
                        <xsd:format ref:=
"tns:BaseTextNumberStandardDecimal" dfdl:lengthKind="explicit"
                              dfdl:lengthUnits="bytes" dfdl:alignment="1"
dfdl:alignmentUnits="bytes"
                              dfdl:leadingSkipBytes="0"
dfdl:trailingSkipBytes="0" />
                  </xsd:defineFormat>


                  <xsd:defineFormat name="CobolZonedDecimalFormat">
                        <xsd:format ref:="tns:BaseTextNumberZonedDecimal"
dfdl:lengthKind="explicit"
                              dfdl:lengthUnits="bytes" dfdl:alignment="1"
dfdl:alignmentUnits="bytes"
                              dfdl:leadingSkipBytes="0"
dfdl:trailingSkipBytes="0" />
                  </xsd:defineFormat>

-- Text number Formats ( added here for reference to identify applicable
attributes for standard and zoned decimal)

            <xsd:defineTextNumberFormat name="ZonedDecimalNumberFormat">
                        <xsd:textNumberFormat numberCheckPolicy="lax"
numberRoundingMode="roundUp"
                              numberZonedSignStyle="asciiStandard" />
                  </xsd:defineTextNumberFormat>

                  <xsd:defineTextNumberFormat name="StandardDecimalFormat">
                        <xsd:textNumberFormat numberGroupingSeparator=","
                              numberDecimalSeparator="."
numberExponentCharacter="E" numberCheckPolicy="lax"
                              numberInfinityRep="\u221E" numberNanRep=
"\uFFFD" numberRoundingMode="roundUp"
                              numberZeroRep="&quot; &quot;" />
                  </xsd:defineTextNumberFormat>



Suman Kalia
IBM Toronto Lab
WMB Toolkit Architect and Development Lead
WebSphere Business Integration Application Connectivity Tools

http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html


Tel : 905-413-3923  T/L  969-3923
Fax : 905-413-4850 T/L  969-4850
Internet ID : kalia at ca.ibm.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100202/b8376c17/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20100202/b8376c17/attachment-0002.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20100202/b8376c17/attachment-0003.gif 


More information about the dfdl-wg mailing list