[DFDL-WG] Fw: Minutes from 2007-08-08 Call - comments from Steve

Suman Kalia kalia at ca.ibm.com
Thu Aug 16 12:02:44 CDT 2007


Comments in Green

Suman Kalia
IBM Toronto Lab
WebSphere Business Integration Application Connectivity Tools 
Tel : 905-413-3923  T/L  969-3923
Fax : 905-413-4850 T/L  969-4850
Internet ID : kalia at ca.ibm.com
----- Forwarded by Suman Kalia/Toronto/IBM on 08/16/2007 12:46 PM -----

"Simon Parker" <simon.parker at polarlake.com> 
Sent by: dfdl-wg-bounces at ogf.org
08/16/2007 12:28 PM

To
<dfdl-wg at ogf.org>
cc

Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call - comments from Steve






Responses embedded below
 Simon
From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: 15 August 2007 12:23
To: Mike Beckerle
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org; Simon Parker
Subject: [DFDL-WG] Minutes from 2007-08-08 Call - comments from Steve


I've spent today catching up with the recent DFDL spec discussions around 
Simon's comments to v0.19. Some comments of my own on the content of these 
and previous call minutes. 

- General principle: The eventual consumers of DFDL will be users the 
majority of whom will not be data modelling experts, that's certainly the 
experience at IBM.  Most see data modelling as a black art and find it 
difficult. I think that an over-reliance on hidden elements is not going 
to go down well. I would err on the side of caution here, and only if we 
are convinced a property will be very rarely used should we remove it and 
replace by a hidden element.   
[Simon] Accepted, providing we can specify everything. Ideally we'll 
publish a rigorous, orthogonal language and a convenient, intuitive 
library with controlled redundancy.

- Leading/Trailing Skip Bytes is a property intended to handle the byte 
skipping added by compilers, over and above simple byte alignment rules. 
The formulae for setting the values is beyond the ken of users to set 
manually, it would invariably be done using an automated COBOL -> DFDL 
translator, etc. I would not be too troubled if that went 'hidden'. 

SKK -- For complex scenarios ( e.g. occurs depending on elements in 
COBOL), different compilers follow quite complicated algorithm to add 
slack bytes at the end or front of structures to properly align array 
elements and current set of technologies for COBOL->DFDL may not be able 
to extract this information from compilers/interpreters as they may not be 
exposed through well defined interfaces in which case the user may have to 
manually adjust the values for Leading/Trailing skip counts. I would not 
vote for this attributes to be hidden, they are certainly advanced 
properties used occasionally to cater to such complex scenarios.  

'finalTerminatorCanBeMissing' property. The rules for interpreting what 
trailing markup actually means are complex and properties like this will 
almost certainly be needed. 
SKK -- I tend to agree with Steve
(Aside: For Mike's second example, though, where data of max length n is 
terminated by markup only if actual length < n, wouldn't that be better 
expressed using a regular expression?  finalTerminatorCanBeMissing is too 
general, and could lead the parser to validly parse data where the 
terminator was accidentally omitted). 

- Infix/prefix/postfix separators. I believe this should be retained. It's 
in IBM WTX (Mercator) and I frequently have to apologise for the absence 
of postfix in IBM MRM. When a user sees (eg) x,y,z it's easier for him to 
comprehend that the comma after z is a postfix separator rather than the 
terminator of the parent group. 

- Simon had a comment on the removal of 'applies' which I haven't seen 
discussed ("I find this cumbersome. I suggest this alternative: drop 
‘applies’ and ‘dfdl:format’, insist on ‘dfdl:sequence’ and friends 
instead, and add local variants like ‘dfdl:sequenceLocal’. For attribute 
shorthand, add boolean attributes with the same name: sequenceLocal=”true” 
(optional, default false)."). 

SKK - I am not comfortable with using names like sequenceLocal, if we go 
with this you will quickly compound the problem with allLocal, choiceLocal 
etc..The intent here is to specify the scope and it is best expressed 
through one generic property with different set of enumeration values 
identifying scope. 

I don't follow, the use of 'applies' is orthogonal to whether you use 
dfdl:format or one of the specific elements such as dfdl:sequence. 
[Simon] You're right, the ideas should be discussed separately. My hasty 
comment throws it all in together.
 
1 Replace this:
    <dfdl:format applies="hereOnly">
with this:
    <dfdl:formatLocal>
 
Why? Because 'applies' is a metaproperty that doesn't describe the 
representation, and should be prominent. Also, for brevity.
 
2 Replace this:
    <dfdl:format>
with one of these:
    <dfdl:element> <dfdl:sequence> <dfdl:complexType>...
 
Why? For ease of validation and interpretation, to make mistakes more 
obvious to human readers, and to support more rigorous specification of 
the relationship between properties and xsd constructs.
 

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 


Mike Beckerle <beckerle at us.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 
14/08/2007 14:23 


To
dfdl-wg at ogf.org, "Simon Parker" <simon.parker at polarlake.com> 
cc

Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call









I forgot to clarify Simon's question on sp165. 

This was the 'finalTerminatorCanBeMissing" property. 

We considered the comment that this might be unnecessary. 

Use case: file of text format. Each "record" in the file is terminated by 
a CRLF so sez the user. At the top level this file contains an array of 
these records. 

The file might or might not have a CRLF at the end of the file because 
human beings might have edited the file with a text editor, and either 
inserted or neglected to insert this final CRLF. 

We want the file format to be legal with or without the final CRLF; 
however, all prior CRLFs in the file must be present. 

So how to express this: 
1) CRLF is a terminator of the record 
2) CRLF is an occursSeparator of the enclosing array, records have no 
terminator. We enclose the array in a sequence group where the array is 
followed by a hidden "optional" (minOccurs=0 max=1) element of 
fixed="CRLF" string value. 

Choice (1) requires that we have finalTerminatorCanBeMissing 

Choice (2) is just modeling the behavior that is required directly via 
hidden elements. This is tantamount to saying that this keyword is not 
worth having because there is a way to model it already. This is true of 
many keywords. If we deem this one too obscure, then we need to revisit 
many others. (Leading/Trailing Skip Bytes is a good example. Trivially 
represented by a hidden element).  What are our criteria for inclusion? Up 
until now our criteria have been to include things that existing systems 
already have found a need for. However, existing systems don't have hidden 
field capability. 

Note that this same missing final terminator issue can come up not only 
with End-of-data, but with any bounded size structure. 

E.g., suppose we say that an array has occursUnits="bytes" and 
occursPath="874". Then it is 874 bytes long. The array elements can be 
terminated by a particular data. E.g., semicolon. For the same reasons as 
the CRLF example above, we want to be able to tolerate a missing final 
semicolon before the end of the 874 bytes.  In effect the 
byte-length-limit creates an implicit "end-of-data" for a sub-stream 
consisting of just those bytes. 

Conclusion: finalTerminatorCanBeMissing seems to be useful enough and 
comes up often enough that I think the keyword is worthwhile. 

Implication: we should create a list of keywords or enumerated values for 
properties  that we think are in the grey area where perhaps we want to 
drop them. Here's some candidates: byteOrderMarkPolicy, 
leading/trailingSkipBytes. Both these can be modeled readily as hidden 
elements. There are probably others. 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                priordan at us.ibm.com 
                508-599-7046



Mike Beckerle/Worcester/IBM 
08/14/2007 08:40 AM 


To
"Simon Parker" <simon.parker at polarlake.com> 
cc
dfdl-wg at ogf.org 
Subject
Re: [DFDL-WG] Minutes from 2007-08-08 CallLink









In conjunction with the annotated document these notes are clear, except 
for 'sp165'. Perhaps someone will recapitulate the discussion briefly at 
Wednesday's conference. I think only three annotations remain: 

   sp167 Absent and missing (expanded discussion on the wiki already) 

This will be a major topic on a call. 

   sp172 separatorType="infix" 

I'm happy to drop this strange stuff about separatorType=prefix or postfix 
and just say separator means infix. However, I would note that at least 
two major integration products (IBM WebSphere Transformation Extender - 
formerly Mercator, and Microsoft Biztalk, have this concept, so we may end 
up putting it back in. Presumably MS copied the earlier Mercator style, or 
both got it from common requirements in some EDI standard. 

   sp173 defaultWhenMissing (expanded discussion on the wiki already) 

Same topic as sp167 above. Will have a call topic to discuss. 
 
I've added another contribution to the wiki discussion on 'require'. 

This seems to be at resolution I think, which is that we can express this 
using assertions. The general style of using DFDL to describe what 
fixed-data syntactic constructs look like is a good one. 

However, I've amended the Wiki thread on this with a further issue for 
group consideration. See bottom of page: 
https://forge.gridforum.org/sf/wiki/do/viewPage/projects.dfdl-wg/wiki/Require?_message=1187096164776 

 
The 'length and occurs' proposal is an improvement, though I still have 
reservations to discuss; likewise the 'opaque data' proposal. 

For a call, this week or soon. I will send out an agenda. 

Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
                priordan at us.ibm.com 
                508-599-7046



"Simon Parker" <simon.parker at polarlake.com> 
Sent by: dfdl-wg-bounces at ogf.org 
08/13/2007 10:56 AM 


To
<dfdl-wg at ogf.org> 
cc

Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call











 
In conjunction with the annotated document these notes are clear, except 
for 'sp165'. Perhaps someone will recapitulate the discussion briefly at 
Wednesday's conference. I think only three annotations remain: 

   sp167 Absent and missing (expanded discussion on the wiki already) 
   sp172 separatorType="infix" 
   sp173 defaultWhenMissing (expanded discussion on the wiki already) 
 
I've added another contribution to the wiki discussion on 'require'. 
 
The 'length and occurs' proposal is an improvement, though I still have 
reservations to discuss; likewise the 'opaque data' proposal. 
 
Regards, 
Simon 
 

From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] On Behalf 
Of Mike Beckerle
Sent: 08 August 2007 18:00
To: dfdl-wg at ogf.org
Subject: [DFDL-WG] Minutes from 2007-08-08 Call


MikeB, Geoff Judd, Alan Powell attended. 

Continued through SP's comments. 

sp37 - got it. 

sp45 - agree. This whole part to be rewritten. 

sp115 - ok. strict and "lax" as enums. No built-in default - we never use 
defaults in the processor itself. Only in the predefined formats. 

sp118 - ok 

sp123 - Proposal to simplify length, lengthKind, lengthUnits, and also 
occursKind, occursPath, occursPathUnits needed. (along the lines of 
byteCount, itemCount, length='delimited' enum, etc.) 

sp154 - Need specific proposal to eliminate hexBinary and use what for 
opaque (consider also string with encoding='bytes'. )  Or introduce a 
dfdl:byteString type or dfdl:opaque type. (derived type - just a standard 
name). 


sp158 - see sp123 

sp165 - needed to have composition property for enclosing groups and or 
end-of-data. Regexp doesn't fix this. 


Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan 
               priordan at us.ibm.com 
               508-599-7046
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg 
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg 





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20070816/2d95b2fb/attachment-0001.htm 


More information about the dfdl-wg mailing list