[DFDL-WG] Minute for OGF DFDL Working Group Call, August 11-2010

Alan Powell alan_powell at uk.ibm.com
Thu Aug 12 09:27:44 CDT 2010


Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, August 11-2010

Attendees
Steve Hanson (IBM) 
Alan Powell (IBM) 
Stephanie Fetzer (IBM)
Tim Kimber(IBM) 
Bob McGrath (National Center for Supercomputing Applications)
Alejandro Rodriguez (National Center for Supercomputing Applications)

Apologies
Mike Beckerle (Oco)
Suman Kalia (IBM)


1. Current Actions 
Updated Below 

2 Semantics of newVariableInstance and setVariable 

what should a DFDL processor ( parser or serializer ) do when it cannot 
evaluate the expression in a newVariableInstance or setVariable 
annotation?

Moving the setting of variable values after the element has been parsed 
just creates other problems  A new instance must be available to other 
expressions on the same component, and to the children of a group/element. 
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable / 
newVariableInstance annotations which *cannot* be evaluated until the 
element has been parsed. 

For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is 
started
- if it cannot be evaluated, add it to a list of annotations that must be 
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable 
that was being set/created then throw a processing error ( because the 
result will be undefined ). This will probably require the 
variable/instance to be placed into a 'not available' state until its 
expression is resolvable 
11/08: Not discussed. Action raised
3 Daffodil DFDL parser implementation at National Center for 
Supercomputing Applications
Bob and Alejandro described the new implementation that they have 
developed. It is a new code base and is not based on the Deffudle 
prototype. It is written in scala and implements aproximately 80% of the 
features in the public comments draft of DFDL V1. Alejandro will send a 
list of the features not implemented.
We discussed the scenarios that motivated the development which was to 
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the 
functionality. IBM WG members will get approval the company  to allow them 
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation 
of DFDL then we will need to work out how that would be funded and 
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will 
investigate
Meeting closed, 16:30

Next call  Wednesday  25 August  2010  15:00 UK  (10:00 ET)

Next action: 112
Actions raised at this meeting

No
Action 
110
Semantics of newVariableInstance and setVariable 
what should a DFDL processor ( parser or serializer ) do when it cannot 
evaluate the expression in a newVariableInstance or setVariable 
annotation?

Moving the setting of variable values after the element has been parsed 
just creates other problems  A new instance must be available to other 
expressions on the same component, and to the children of a group/element. 
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable / 
newVariableInstance annotations which *cannot* be evaluated until the 
element has been parsed. 

For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is 
started
- if it cannot be evaluated, add it to a list of annotations that must be 
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable 
that was being set/created then throw a processing error ( because the 
result will be undefined ). This will probably require the 
variable/instance to be placed into a 'not available' state until its 
expression is resolvable 
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have 
developed. It is a new code base and is not based on the Deffudle 
prototype. It is written in scala and implements aproximately 80% of the 
features in the public comments draft of DFDL V1. Alejandro will send a 
list of the features not implemented.
We discussed the scenarios that motivated the development which was to 
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the 
functionality. IBM WG members will get approval the company  to allow them 
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation 
of DFDL then we will need to work out how that would be funded and 
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will 
investigate




Current Actions:
No
Action 
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and 
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will 
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases 
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
11/08: work continues
085
ALL: publicise Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask 
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan 
had posted changes made in draft 041. Steve suggested send a note to the 
WG highlighting these changes.  Steve also suggested requesting an 
extension as other IBM groups may review. We discussed whether this was 
necessary as changes will need to be made during the implementation phase 
anyway. Alan to ask OGF what the process is for changes post public 
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of 
the process.
30/06: Alan has emailed Joel asking what the process is now public comment 
period is over and can we update the published version with WG updates. No 
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes 
are significant enough to need additional review. Alan to contact David 
Martin and Erwin Laure for guidance if we split the specification.
11/08: Received a  response from Joel that the WG can decide if a re- 
public review is necessary before becoming a 'proposed recommendation'. 
Alan responded that the WG agreed that a re-review was not necessary. The 
next stage is for  OGF review committee to approve publication.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a 
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports. 
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL 
schema from an implementation of just the core functions was moved to a 
full DFDL implementation what should happen about the missing properties. 
Does the full implementation need to be aware of subsets of functions? 
Should it raise a schema definition error for use of a function not in the 
subset. 
21/07: no progress
04/08: Steve had updated proposed groups of function. 
(Subset_proposal_v2.ppt). We discussed whether its is better to have 
discrete sets of functions or expanding levels of function. 
Purpose of subsetting is:
1. Allow simpler implementations.  (main purpose)
2. Simplify tooling
3. Simplify specification. 
Steve to contact previous members of WG to check if we have the correct 
subsets
11/08: Steve sent an email to previous members of the WG asking for 
opinions on splitting the specification. Bob McGrath from National Center 
For Supercomputing responded that they had implemented about 80% of the 
function. Alejandro will send a description of the function they have 
implemented.
Action will be raised to track the Daffodil implementation
101
Semantics of 'fixed' 
21/07: Discussed whether not matching the 'fixed' value should be a 
validation error or processing error. Decided that for consistency it 
should be a validation error.
It would be useful however to avoid having to duplication of facet 
information in an assert which could become unwieldy for, say, a large 
enumeration.
Suggestions
- a parser option that 'converted all validation errors to processing 
errors'
- a dfdl expression function that  'applied all facets' or 'applied 
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to 
lack of time
11/08: We started to discuss Stephanie's HIPPA example but ran out of 
time.
104
Expressions 
Discuss error behaviour when evaluating an expression in various contexts 
- All properties: 
wrong type returned : schema definition error 
exception when evaluating expression : schema definition error 
referenced variables/paths not available : schema definition error 
- Properties which allow a forward reference 
referenced variables/paths not available : no error. DFDL processor 
continues processing until the expression result is available, then acts 
on the result. 
21/07: Steve stated the current definition that returning the incorrect 
type was a schema definition error and everything else was a processing 
error.
04/08: Not discussed
11/08: Not discussed
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual 
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that 
use existence flags to see if they are still required.
04/08: Not discussed
11/08: Not discussed 
108
dfdl:hidden 
There has been some discussion on whether the 'hidden' global group should 
be indicated in some way.
04/08: A lively discussion. The specification is works as currently 
defined so whether changes need to be made to make tooling easier. There 
shouldn't be 'conventions' in particular tooling as they must be able to 
properly deal with schema from other tools that would not obey those 
conventions. Steve stated that it is often dangerous to hide too much from 
users when they can see they underlying schema. To be continued.
109
 dfdl:discriminator : the 'message' attribute   
>From Tim: 
I remembered the reason why I thought this was a good idea. 
Consider the situation where someone is generating their DFDL schema from 
meta-data. The model is large, and consists of many references to global 
structures. Each global structure ( e.g. an HL7 segment ) is identified in 
a particular way. Sometimes the segment is required, sometimes it is not. 
Sometimes it occurs as a child of a choice group, and sometimes not. 
Regardless, it is highly likely that the segment will be identified in the 
same way wherever it occurs. A natural decision for the modeler would be 
to create a dfdl:discriminator on all references to the segement, even if 
the ref is not under a point of uncertainty. It's harmless, and it carries 
no performance penalty. If we disallow the "message" attribute, it will 
force the modeler to put in extra logic to work out whether the ref is 
under a POI, and generate an assert/discriminator as appropriate. 

I'd be interested to know what Steph thinks about this - I think I've 
heard her say that she sometimes uses discriminators where an assert would 
have done the job, just to maintain consistency throughout the model.
04/08: not discussed.
11/08: Not discussed 
110
Semantics of newVariableInstance and setVariable 
what should a DFDL processor ( parser or serializer ) do when it cannot 
evaluate the expression in a newVariableInstance or setVariable 
annotation?

Moving the setting of variable values after the element has been parsed 
just creates other problems  A new instance must be available to other 
expressions on the same component, and to the children of a group/element. 
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable / 
newVariableInstance annotations which *cannot* be evaluated until the 
element has been parsed. 

For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is 
started
- if it cannot be evaluated, add it to a list of annotations that must be 
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable 
that was being set/created then throw a processing error ( because the 
result will be undefined ). This will probably require the 
variable/instance to be placed into a 'not available' state until its 
expression is resolvable 
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have 
developed. It is a new code base and is not based on the Deffudle 
prototype. It is written in scala and implements aproximately 80% of the 
features in the public comments draft of DFDL V1. Alejandro will send a 
list of the features not implemented.
We discussed the scenarios that motivated the development which was to 
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the 
functionality. IBM WG members will get approval the company  to allow them 
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation 
of DFDL then we will need to work out how that would be funded and 
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will 
investigate

Closed actions
No
Action 









Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
070
Write DFDL primer 


071
Write test cases.


083
Implement RFC2116













































 
Regards

 
Alan Powell
 
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell at uk.ibm.com






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100812/3b307fa3/attachment-0001.html 


More information about the dfdl-wg mailing list