[DFDL-WG] EDIFACT schema - daffodil bug or non-bug
Steve Hanson
smh at uk.ibm.com
Fri Nov 6 13:00:07 EST 2015
Mike
I think that is a bug in Daffodil. The DFDL spec says that escapeSchemeRef
applies to simple types with text representation, so Daffodil is
evaluating the escape scheme properties prematurely. If you notice in the
UNA declaration, the simple elements at the start of the UNA that carry
the delimiters and escape character all have dfdl:escapeSchemeRef="" to
avoid tripping the check.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM Integration Bus, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date: 04/11/2015 17:17
Subject: [DFDL-WG] EDIFACT schema - daffodil bug or non-bug
Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
I'm trying to get EDIFACT working on Daffodil.
I have a somewhat interesting chicken-egg problem.
This schema uses dfdl:escapeCharacter and dfdl:escapeEscapeCharacter as
expressions. E.g., there is a top-level dfdl:defineVariable named
"EscapeChar" which has a default value, and the expression for the
dfdl:escapeCharacter property is { $ibmEdiFmt:EscapeChar }.
The default format that is in effect for the root element has
dfdl:lengthKind='delimited'.
When daffodil starts parsing the top level root/document element, it
enters a parser that is for delimited elements with an escape-scheme in
effect. First thing this parser does is get the escape scheme which
evaluates the expressions for escapeCharacter and escapeEscapeCharacter.
This picks up the default values for those variables and the variables are
then set as "already evaluated", as DFDL specifies that once a variable's
default value has been used, it cannot be subsequently set via
dfdl:setVariable.
Now, when the very first UNA is encountered, that reads the various
delimiters/escapes from the data, and tries to set the variables.
But the variables have already been evaluated, on the way into parsing the
"delimited" top level element, and the UNA element itself similarly.
So it fails with a runtime SDE - default value has already been used.
So the questions:
Is this a schema bug in the EDIFACT schema, or is there a principle at
work here indicating that Daffodil cannot evaluate the escape scheme on
entry to an element of length kind delimited unless delimiters are
actually defined?
It gets worse though. How late bound does this have to be? I can imagine
it being so late as to be after the last child element/group has been
parsed, when the parser unwinds the stack back up to the complex-type
element's tier, and only at that point, when it scans for the terminating
markup, would it then force the evaluation of the escape scheme. But that
seems difficult to implement. However, that would allow the delimiter for
the complex-type element to actually be stored within the children of that
same complex type element. But is this needed?
One could argue that the EDIFACT schema should have
dfdl:lengthKind='implicit' on these global elements down until the UNA has
been parsed. Though I think that makes authoring schemas harder because a
user thinks of edifact stuff as "a delimited format", and is naturally
just going to want to stick dfdl:lengthKind="delimited" at global scope
for all the schema components.
Thoughts?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20151106/3d2dec8b/attachment-0001.html>
More information about the dfdl-wg
mailing list