[DFDL-WG] EDIFACT schema - daffodil bug or non-bug

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Nov 4 12:16:30 EST 2015


I'm trying to get EDIFACT working on Daffodil.

I have a somewhat interesting chicken-egg problem.

This schema uses dfdl:escapeCharacter and dfdl:escapeEscapeCharacter as
expressions. E.g., there is a top-level dfdl:defineVariable named
"EscapeChar" which has a default value, and the expression for the
dfdl:escapeCharacter property is { $ibmEdiFmt:EscapeChar }.

The default format that is in effect for the root element has
dfdl:lengthKind='delimited'.

When daffodil starts parsing the top level root/document element, it enters
a parser that is for delimited elements with an escape-scheme in effect.
First thing this parser does is get the escape scheme which evaluates the
expressions for escapeCharacter and escapeEscapeCharacter. This picks up
the default values for those variables and the variables are then set as
"already evaluated", as DFDL specifies that once a variable's default value
has been used, it cannot be subsequently set via dfdl:setVariable.

Now, when the very first UNA is encountered, that reads the various
delimiters/escapes from the data, and tries to set the variables.

But the variables have already been evaluated, on the way into parsing the
"delimited" top level element, and the UNA element itself similarly.

So it fails with a runtime SDE - default value has already been used.

So the questions:

Is this a schema bug in the EDIFACT schema, or is there a principle at work
here indicating that Daffodil cannot evaluate the escape scheme on entry to
an element of length kind delimited unless delimiters are actually defined?

It gets worse though. How late bound does this have to be? I can imagine it
being so late as to be after the last child element/group has been parsed,
when the parser unwinds the stack back up to the complex-type element's
tier, and only at that point, when it scans for the terminating markup,
would it then force the evaluation of the escape scheme. But that seems
difficult to implement. However, that would allow the delimiter for the
complex-type element to actually be stored within the children of that same
complex type element. But is this needed?

One could argue that the EDIFACT schema should have
dfdl:lengthKind='implicit' on these global elements down until the UNA has
been parsed. Though I think that makes authoring schemas harder because a
user thinks of edifact stuff as "a delimited format", and is naturally just
going to want to stick dfdl:lengthKind="delimited" at global scope for all
the schema components.

Thoughts?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20151104/dcffabdf/attachment.html>


More information about the dfdl-wg mailing list