[DFDL-WG] EDIFACT schema - daffodil bug or non-bug

Steve Hanson smh at uk.ibm.com
Fri Nov 6 13:00:07 EST 2015


Mike

I think that is a bug in Daffodil. The DFDL spec says that escapeSchemeRef 
applies to simple types with text representation, so Daffodil is 
evaluating the escape scheme properties prematurely. If you notice in the 
UNA declaration, the simple elements at the start of the UNA that carry 
the delimiters and escape character all have dfdl:escapeSchemeRef="" to 
avoid tripping the check.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM Integration Bus, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   04/11/2015 17:17
Subject:        [DFDL-WG] EDIFACT schema - daffodil bug or non-bug
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>




I'm trying to get EDIFACT working on Daffodil.

I have a somewhat interesting chicken-egg problem.

This schema uses dfdl:escapeCharacter and dfdl:escapeEscapeCharacter as 
expressions. E.g., there is a top-level dfdl:defineVariable named 
"EscapeChar" which has a default value, and the expression for the 
dfdl:escapeCharacter property is { $ibmEdiFmt:EscapeChar }.

The default format that is in effect for the root element has 
dfdl:lengthKind='delimited'.

When daffodil starts parsing the top level root/document element, it 
enters a parser that is for delimited elements with an escape-scheme in 
effect. First thing this parser does is get the escape scheme which 
evaluates the expressions for escapeCharacter and escapeEscapeCharacter. 
This picks up the default values for those variables and the variables are 
then set as "already evaluated", as DFDL specifies that once a variable's 
default value has been used, it cannot be subsequently set via 
dfdl:setVariable.

Now, when the very first UNA is encountered, that reads the various 
delimiters/escapes from the data, and tries to set the variables.

But the variables have already been evaluated, on the way into parsing the 
"delimited" top level element, and the UNA element itself similarly. 

So it fails with a runtime SDE - default value has already been used.

So the questions:

Is this a schema bug in the EDIFACT schema, or is there a principle at 
work here indicating that Daffodil cannot evaluate the escape scheme on 
entry to an element of length kind delimited unless delimiters are 
actually defined? 

It gets worse though. How late bound does this have to be? I can imagine 
it being so late as to be after the last child element/group has been 
parsed, when the parser unwinds the stack back up to the complex-type 
element's tier, and only at that point, when it scans for the terminating 
markup, would it then force the evaluation of the escape scheme. But that 
seems difficult to implement. However, that would allow the delimiter for 
the complex-type element to actually be stored within the children of that 
same complex type element. But is this needed?

One could argue that the EDIFACT schema should have 
dfdl:lengthKind='implicit' on these global elements down until the UNA has 
been parsed. Though I think that makes authoring schemas harder because a 
user thinks of edifact stuff as "a delimited format", and is naturally 
just going to want to stick dfdl:lengthKind="delimited" at global scope 
for all the schema components.

Thoughts?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20151106/3d2dec8b/attachment-0001.html>


More information about the dfdl-wg mailing list