[DFDL-WG] Possible standardizing Diagnostic Codes or Categories for DFDL implementations.

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Apr 7 13:08:36 EDT 2015


I reviewed the IBM messages in the validatordescriptions.properties and
modeldescriptions.properties files with an eye toward what categories
things fall into that might be universal for DFDL implementations.

There are some questions below. Search for ??

I was able to infer some elements of structure in the assignment of
identifiers:

Letter Codes
S - Diagnostics about the subset of XML Schema constructs used in DFDL
V - Diagnostics about validation of the DFDL schema
X - Diagnostics about DFDL schema loading

Suffixes
E = error
W = warning
D = description - extended description of error/warning sharing numeric code
A = action - suggested corrective action

Ranges
1000-1100 - subset of XML Schema (uses letter code S, so range is redundant
??)

1100-1101 - schema loading (letter code X)
1102-1103 - ??? for schema validation errors and warnings (what are these
for??) ??? (letter code X)
1104-1105 - schema loading (letter code X)

1106-1149 - DFDL "physical validation" (letter code V) ?? how are these
different from 1200+ ??

1150-1159 - Implementation-specific unsupported features. (letter code V)

1160 - Internal Error of the implementation (Letter code V)

1200+  validate proper use of DFDL properties (letter code V) ??? How is
this different from 1106-1149 ???

1409-1420 - escape scheme related (only 11 values here, not really enough)

1557 - Internal Error of the implementation (letter code V) - expression
related.

In the modeling properties, most seem indistinguishable from the 1200+ V
validatordescription With one exception:

CTDM2101E - DFDL namespace prefix - This is an implementation specific
limitation. That's a different category - not related to
implementation-specific unsupported DFDL features, it's just an ad-hoc
implementation-specific restriction.

*Preliminary Conclusions:*

The above suggests these categories into which Schema Definition Errors can
be divided:

* Loading of DFDL schema - can't find file, file isn't a schema, etc.
* Validation of DFDL schema content with sub-categories:
** SUBSET: The subset of XML Schema allowed by DFDL
** UNSUPPORTED: Implementation-specific unsupported features of DFDL
** LIMITATION: Implementation-specific limitations/restrictions
** INTERNAL: Internal error of implementation

The specific fine granularity of the individual messages seems too fine for
inclusion in the DFDL standard. In many cases these messages correspond
1-to-1 with requirements one might extract from the specification, but
having more than one place where these must be maintained - the spec and
also some standardized diagnostic message base - seems problematic. I would
suggest that the exercise of extracting and uniquely identifying every
requirement in the specification, is much the same task, and is a very big
job.

But what about the categorization of the errors. Well this too seems
problematic. For example, in the Daffodil implementation, many of the
Subset-of-XML-Schema errors will be detected by our simply use of Xerces to
validate the DFDL schema against the XML Schema of DFDL schemas. Those are
not separated out from any other sort of problems - there is no distinction
made between something not being in the subset of XSD, and something just
being illegal. For example, both these situations will produce a near
identical error message:

<complexType ... mixed='true' ...>

<complexType ...  type="foo" ...>

Both will complain about an unknown attribute that isn't allowed. The fact
that 'mixed' is not in the subset of XSD that DFDL uses, and 'type' is just
a mistake and it should be 'name' - no distinction is drawn between these.
Drawing this distinction is hard. We effectively have to intercept and
re-classify the diagnostic messages generated from Xerces. Given our
arms-length relationship to the Xerces code base, this would be fragile at
best.

In addition, when in the daffodil-proper code base, the notion of "subset"
isn't used in the sense of the subset of XML Schema constructs allowed in
DFDL, but in the sense of subset of DFDL features that are unsupported by
the implementation. That is, Daffodil uses the term Subset to mean what is
called UNSUPPORTED above.

Ultimately, the question arises of what standardizing error/diagnostic
messages is intended to achieve. Greater consistency between
implementations is always desirable, but the cost of achieving this is very
high given the effort already sunk into Daffodil. So long as the diagnostic
messaging identifies the problems in the schema in a useful way to the user
what more is needed? In some sense, the quality of the diagnostic messaging
is an important distinguishing characteristic of different implementations.
Standardizing this behavior rigorously seems inconsistent with say, Section
21 of the DFDL spec which lists a large number of features of the DFDL
language as entirely optional. I would go so far as to say diagnostic
messaging at all should be considered optional. One can imagine a DFDL
implementation consistent with Section 21, which for all Schema Definition
Errors produces a diagnostic message saying "SDE" and a file name and line
number, with no further information. This isn't as nice as a descriptive
diagnostic, but it is conceptually aligned with the notion that very small
DFDL implementations should be possible.

The Daffodil flavor of TDML allows negative tests to express sub-strings
that must appear within the diagnostic messages. So instead of a test
expecting exactly code

CTDS1007E = CTDS1007E : Schema redefines are not allowed in DFDL schemas.

We would define the test to mention "redefine" and "Schema Definition
Error". We would probably also include fragments of the schema/file name as
required to be in the diagnostic message. Any diagnostic message or set of
diagnostic messages that contain (case insensitive) these strings would
pass the test. This can lead to false positive passes, but avoids the
over-specification of the specific diagnostic phrases used.

*Conclusion:*

I think we should not further attempt to standardize diagnostic messaging
as part of the DFDL standardization process.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150407/2b199ad2/attachment.html>


More information about the dfdl-wg mailing list