[DFDL-WG] Minutes from 2007-08-08 Call - comments from Steve

Simon Parker simon.parker at polarlake.com
Thu Aug 16 11:28:19 CDT 2007


Responses embedded below
 Simon


________________________________

	From: Steve Hanson [mailto:smh at uk.ibm.com] 
	Sent: 15 August 2007 12:23
	To: Mike Beckerle
	Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org; Simon Parker
	Subject: [DFDL-WG] Minutes from 2007-08-08 Call - comments from
Steve
	
	
	

	I've spent today catching up with the recent DFDL spec
discussions around Simon's comments to v0.19. Some comments of my own on
the content of these and previous call minutes. 
	
	- General principle: The eventual consumers of DFDL will be
users the majority of whom will not be data modelling experts, that's
certainly the experience at IBM.  Most see data modelling as a black art
and find it difficult. I think that an over-reliance on hidden elements
is not going to go down well. I would err on the side of caution here,
and only if we are convinced a property will be very rarely used should
we remove it and replace by a hidden element.   
	[Simon] Accepted, providing we can specify everything. Ideally
we'll publish a rigorous, orthogonal language and a convenient,
intuitive library with controlled redundancy.
	
	- Leading/Trailing Skip Bytes is a property intended to handle
the byte skipping added by compilers, over and above simple byte
alignment rules. The formulae for setting the values is beyond the ken
of users to set manually, it would invariably be done using an automated
COBOL -> DFDL translator, etc. I would not be too troubled if that went
'hidden'. 
	
	'finalTerminatorCanBeMissing' property. The rules for
interpreting what trailing markup actually means are complex and
properties like this will almost certainly be needed. (Aside: For Mike's
second example, though, where data of max length n is terminated by
markup only if actual length < n, wouldn't that be better expressed
using a regular expression?  finalTerminatorCanBeMissing is too general,
and could lead the parser to validly parse data where the terminator was
accidentally omitted). 
	
	- Infix/prefix/postfix separators. I believe this should be
retained. It's in IBM WTX (Mercator) and I frequently have to apologise
for the absence of postfix in IBM MRM. When a user sees (eg) x,y,z it's
easier for him to comprehend that the comma after z is a postfix
separator rather than the terminator of the parent group. 
	
	- Simon had a comment on the removal of 'applies' which I
haven't seen discussed ("I find this cumbersome. I suggest this
alternative: drop 'applies' and 'dfdl:format', insist on 'dfdl:sequence'
and friends instead, and add local variants like 'dfdl:sequenceLocal'.
For attribute shorthand, add boolean attributes with the same name:
sequenceLocal="true" (optional, default false)."). I don't follow, the
use of 'applies' is orthogonal to whether you use dfdl:format or one of
the specific elements such as dfdl:sequence. 
	[Simon] You're right, the ideas should be discussed separately.
My hasty comment throws it all in together.
	 
	1 Replace this:
	    <dfdl:format applies="hereOnly">
	with this:
	    <dfdl:formatLocal>
	 
	Why? Because 'applies' is a metaproperty that doesn't describe
the representation, and should be prominent. Also, for brevity.
	 
	2 Replace this:
	    <dfdl:format>
	with one of these:
	    <dfdl:element> <dfdl:sequence> <dfdl:complexType>...
	 
	Why? For ease of validation and interpretation, to make mistakes
more obvious to human readers, and to support more rigorous
specification of the relationship between properties and xsd constructs.
	 
	
	Regards, Steve
	
	Steve Hanson
	WebSphere Message Brokers
	Hursley, UK
	Internet: smh at uk.ibm.com
	Phone (+44)/(0) 1962-815848 
	
	
	
Mike Beckerle <beckerle at us.ibm.com> 
Sent by: dfdl-wg-bounces at ogf.org 

14/08/2007 14:23 

To
dfdl-wg at ogf.org, "Simon Parker" <simon.parker at polarlake.com> 
cc
Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call

	




	
	I forgot to clarify Simon's question on sp165. 
	
	This was the 'finalTerminatorCanBeMissing" property. 
	
	We considered the comment that this might be unnecessary. 
	
	Use case: file of text format. Each "record" in the file is
terminated by a CRLF so sez the user. At the top level this file
contains an array of these records. 
	
	The file might or might not have a CRLF at the end of the file
because human beings might have edited the file with a text editor, and
either inserted or neglected to insert this final CRLF. 
	
	We want the file format to be legal with or without the final
CRLF; however, all prior CRLFs in the file must be present. 
	
	So how to express this: 
	1) CRLF is a terminator of the record 
	2) CRLF is an occursSeparator of the enclosing array, records
have no terminator. We enclose the array in a sequence group where the
array is followed by a hidden "optional" (minOccurs=0 max=1) element of
fixed="CRLF" string value. 
	
	Choice (1) requires that we have finalTerminatorCanBeMissing 
	
	Choice (2) is just modeling the behavior that is required
directly via hidden elements. This is tantamount to saying that this
keyword is not worth having because there is a way to model it already.
This is true of many keywords. If we deem this one too obscure, then we
need to revisit many others. (Leading/Trailing Skip Bytes is a good
example. Trivially represented by a hidden element).  What are our
criteria for inclusion? Up until now our criteria have been to include
things that existing systems already have found a need for. However,
existing systems don't have hidden field capability. 
	
	Note that this same missing final terminator issue can come up
not only with End-of-data, but with any bounded size structure. 
	
	E.g., suppose we say that an array has occursUnits="bytes" and
occursPath="874". Then it is 874 bytes long. The array elements can be
terminated by a particular data. E.g., semicolon. For the same reasons
as the CRLF example above, we want to be able to tolerate a missing
final semicolon before the end of the 874 bytes.  In effect the
byte-length-limit creates an implicit "end-of-data" for a sub-stream
consisting of just those bytes. 
	
	Conclusion: finalTerminatorCanBeMissing seems to be useful
enough and comes up often enough that I think the keyword is worthwhile.

	
	Implication: we should create a list of keywords or enumerated
values for properties  that we think are in the grey area where perhaps
we want to drop them. Here's some candidates: byteOrderMarkPolicy,
leading/trailingSkipBytes. Both these can be modeled readily as hidden
elements. There are probably others. 
	
	Mike Beckerle
	STSM, Architect, Scalable Computing
	IBM Software Group
	Information Platform and Solutions
	Westborough, MA 01581
	direct: voice and FAX 508-599-7148
	assistant: Pam Riordan   
	                priordan at us.ibm.com 
	                508-599-7046
	
	
	
	
Mike Beckerle/Worcester/IBM 

08/14/2007 08:40 AM 



To
"Simon Parker" <simon.parker at polarlake.com> 
cc
dfdl-wg at ogf.org 
Subject
Re: [DFDL-WG] Minutes from 2007-08-08 CallLink
<Notes://d01ml259/85256FDB00077D54/DABA975B9FB113EB852564B5001283EA/BD9C
FD7CA73D7AFD852573360052302A> 


	


	
	
	In conjunction with the annotated document these notes are
clear, except for 'sp165'. Perhaps someone will recapitulate the
discussion briefly at Wednesday's conference. I think only three
annotations remain: 
	
	   sp167 Absent and missing (expanded discussion on the wiki
already) 
	
	This will be a major topic on a call. 
	
	   sp172 separatorType="infix" 
	
	I'm happy to drop this strange stuff about separatorType=prefix
or postfix and just say separator means infix. However, I would note
that at least two major integration products (IBM WebSphere
Transformation Extender - formerly Mercator, and Microsoft Biztalk, have
this concept, so we may end up putting it back in. Presumably MS copied
the earlier Mercator style, or both got it from common requirements in
some EDI standard. 
	
	   sp173 defaultWhenMissing (expanded discussion on the wiki
already) 
	
	Same topic as sp167 above. Will have a call topic to discuss. 
	 
	I've added another contribution to the wiki discussion on
'require'. 
	
	This seems to be at resolution I think, which is that we can
express this using assertions. The general style of using DFDL to
describe what fixed-data syntactic constructs look like is a good one. 
	
	However, I've amended the Wiki thread on this with a further
issue for group consideration. See bottom of page: 
	
https://forge.gridforum.org/sf/wiki/do/viewPage/projects.dfdl-wg/wiki/Re
quire?_message=1187096164776 
	 
	The 'length and occurs' proposal is an improvement, though I
still have reservations to discuss; likewise the 'opaque data' proposal.

	
	For a call, this week or soon. I will send out an agenda. 
	
	Mike Beckerle
	STSM, Architect, Scalable Computing
	IBM Software Group
	Information Platform and Solutions
	Westborough, MA 01581
	direct: voice and FAX 508-599-7148
	assistant: Pam Riordan   
	                priordan at us.ibm.com 
	                508-599-7046
	
	
	
	
"Simon Parker" <simon.parker at polarlake.com> 
Sent by: dfdl-wg-bounces at ogf.org 

08/13/2007 10:56 AM 



To
<dfdl-wg at ogf.org> 
cc
Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call


	


	
	
	
	
	 
	In conjunction with the annotated document these notes are
clear, except for 'sp165'. Perhaps someone will recapitulate the
discussion briefly at Wednesday's conference. I think only three
annotations remain: 
	
	   sp167 Absent and missing (expanded discussion on the wiki
already) 
	   sp172 separatorType="infix" 
	   sp173 defaultWhenMissing (expanded discussion on the wiki
already) 
	 
	I've added another contribution to the wiki discussion on
'require'. 
	 
	The 'length and occurs' proposal is an improvement, though I
still have reservations to discuss; likewise the 'opaque data' proposal.

	 
	Regards, 
	Simon 
	 
	
	
________________________________

	From: dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org]
On Behalf Of Mike Beckerle
	Sent: 08 August 2007 18:00
	To: dfdl-wg at ogf.org
	Subject: [DFDL-WG] Minutes from 2007-08-08 Call
	
	
	MikeB, Geoff Judd, Alan Powell attended. 
	
	Continued through SP's comments. 
	
	sp37 - got it. 
	
	sp45 - agree. This whole part to be rewritten. 
	
	sp115 - ok. strict and "lax" as enums. No built-in default - we
never use defaults in the processor itself. Only in the predefined
formats. 
	
	sp118 - ok 
	
	sp123 - Proposal to simplify length, lengthKind, lengthUnits,
and also occursKind, occursPath, occursPathUnits needed. (along the
lines of byteCount, itemCount, length='delimited' enum, etc.) 
	
	sp154 - Need specific proposal to eliminate hexBinary and use
what for opaque (consider also string with encoding='bytes'. )  Or
introduce a dfdl:byteString type or dfdl:opaque type. (derived type -
just a standard name). 
	
	
	sp158 - see sp123 
	
	sp165 - needed to have composition property for enclosing groups
and or end-of-data. Regexp doesn't fix this. 
	
	
	Mike Beckerle
	STSM, Architect, Scalable Computing
	IBM Software Group
	Information Platform and Solutions
	Westborough, MA 01581
	direct: voice and FAX 508-599-7148
	assistant: Pam Riordan   
	               priordan at us.ibm.com 
	               508-599-7046
	--
	dfdl-wg mailing list
	dfdl-wg at ogf.org
	http://www.ogf.org/mailman/listinfo/dfdl-wg 
	--
	 dfdl-wg mailing list
	 dfdl-wg at ogf.org
	 http://www.ogf.org/mailman/listinfo/dfdl-wg 
	
	
	
	
________________________________

	
	
	

	Unless stated otherwise above:
	IBM United Kingdom Limited - Registered in England and Wales
with number 741598. 
	Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire PO6 3AU 

	
	
	
	
	
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20070816/42a91023/attachment-0001.html 


More information about the dfdl-wg mailing list