[DFDL-WG] EDIFACT schema - daffodil bug or non-bug

Mike Beckerle mbeckerle.dfdl at gmail.com
Fri Nov 6 14:50:46 EST 2015


Bug taken: https://opensource.ncsa.illinois.edu/jira/browse/DFDL-1443


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Fri, Nov 6, 2015 at 1:00 PM, Steve Hanson <smh at uk.ibm.com> wrote:

> Mike
>
> I think that is a bug in Daffodil. The DFDL spec says that escapeSchemeRef
> applies to simple types with text representation, so Daffodil is evaluating
> the escape scheme properties prematurely. If you notice in the UNA
> declaration, the simple elements at the start of the UNA that carry the
> delimiters and escape character all have dfdl:escapeSchemeRef="" to avoid
> tripping the check.
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *IBM Integration Bus*
> <http://www-03.ibm.com/software/products/en/ibm-integration-bus>,
> Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        04/11/2015 17:17
> Subject:        [DFDL-WG] EDIFACT schema - daffodil bug or non-bug
> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
>
> I'm trying to get EDIFACT working on Daffodil.
>
> I have a somewhat interesting chicken-egg problem.
>
> This schema uses dfdl:escapeCharacter and dfdl:escapeEscapeCharacter as
> expressions. E.g., there is a top-level dfdl:defineVariable named
> "EscapeChar" which has a default value, and the expression for the
> dfdl:escapeCharacter property is { $ibmEdiFmt:EscapeChar }.
>
> The default format that is in effect for the root element has
> dfdl:lengthKind='delimited'.
>
> When daffodil starts parsing the top level root/document element, it
> enters a parser that is for delimited elements with an escape-scheme in
> effect. First thing this parser does is get the escape scheme which
> evaluates the expressions for escapeCharacter and escapeEscapeCharacter.
> This picks up the default values for those variables and the variables are
> then set as "already evaluated", as DFDL specifies that once a variable's
> default value has been used, it cannot be subsequently set via
> dfdl:setVariable.
>
> Now, when the very first UNA is encountered, that reads the various
> delimiters/escapes from the data, and tries to set the variables.
>
> But the variables have already been evaluated, on the way into parsing the
> "delimited" top level element, and the UNA element itself similarly.
>
> So it fails with a runtime SDE - default value has already been used.
>
> So the questions:
>
> Is this a schema bug in the EDIFACT schema, or is there a principle at
> work here indicating that Daffodil cannot evaluate the escape scheme on
> entry to an element of length kind delimited unless delimiters are actually
> defined?
>
> It gets worse though. How late bound does this have to be? I can imagine
> it being so late as to be after the last child element/group has been
> parsed, when the parser unwinds the stack back up to the complex-type
> element's tier, and only at that point, when it scans for the terminating
> markup, would it then force the evaluation of the escape scheme. But that
> seems difficult to implement. However, that would allow the delimiter for
> the complex-type element to actually be stored within the children of that
> same complex type element. But is this needed?
>
> One could argue that the EDIFACT schema should have
> dfdl:lengthKind='implicit' on these global elements down until the UNA has
> been parsed. Though I think that makes authoring schemas harder because a
> user thinks of edifact stuff as "a delimited format", and is naturally just
> going to want to stick dfdl:lengthKind="delimited" at global scope for all
> the schema components.
>
> Thoughts?
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20151106/4268c233/attachment.html>


More information about the dfdl-wg mailing list