[DFDL-WG] New scoping rules

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Sep 29 07:52:10 CDT 2009


Alan,
I've done some thinking on the scoping, and I think we've talked ourselves
into a bad position.

>From the note on scoping:

*The proposal currently under consideration is:*

   - *Schema objects inherit DFDL properties from a lexically enclosing
   xs:complexType or xs:group declaration*
   - *DFDL properties on a referenced global schema object (except
   simpleTypes) cannot be overridden unless explicitly parameterized by the
   global object.*

*The above is problematic. This breaks referential transparency.*


   - *DFDL properties explicity defined on an element and it's referenced
   simpeType are merged into a single set. It is an error is the same property
   is defined on both the element and simpleType.*
   - *It must be possible to validate all global objects , except
   simpleTypes.*

*

This last bullet is an unreasonable requirement, depending on how you define
validity. This was put in to simplify a tooling requirement of some sort
that I believe is likely not a good goal for us to accept.

Validity can mean "is consistent", but should not require property
specifications to be "complete".

This is an area of some confusion in DFDL. We have stated that a schema must
have "all required properties" specified, and that there is no defaulting of
property values by implementations. The purpose of this is to avoid
implementation-specific or platform specific assumptions from creeping in so
that DFDL schemas are more likely to be portable. This statement has been
misinterpreted in the following sense. Some have interpreted this as meaning
that all properties that are defined in the DFDL spec must have values set
in order for a schema to be "valid". But when stating the "all required
properties" rule (largely at my insistance), this was definitely not my
intention. Consider for example if a format is all text, and uses a
single-byte character set encoding, then I claim that dfdl:byteOrder need
not be specified as it will never be needed to interpret the data. The point
of saying there are no defaults for property values is NOT to require
dfdl:byteOrder to always be specified, it is to say that if the format
requires dfdl:byteOrder - because it has binary multi-byte representations
in it, or wide characters which have endianness, then dfdl:byteOrder must be
specified by the schema, either directly, by an included schema referenced
by the schema, or must be specified explicitly via some external mechanism -
section 21 of draft 035. The point is that the implementation cannot just
say "there is an unstated default" in this implementation for dfdl:byteOrder
based on the platform you are installed on. If an implementation were to do
that, then the schemas usable with that implementation will not be portable
for use with other implementations - something we are trying to avoid.

The difference here is subtle but important. Section 22 of draft 035 is a
place holder for some pre-defined include-files the inclusion of which will
provide dfdl:defineFormat specifications for useful sets of properties. It
is important for everyone to understand that including these in a DFDL
schema is 100% optional, and is for convenience of obtaining consistent and
meaningful sets of properties only. However, simple formats can be described
without any inclusion of these at all. As another example: if a file
contains only an array of binary floating point numbers, then no
dfdl:encoding property is needed. Just a handful of properties are needed to
parse/unparse such a file format, and those are the ones about binary
floating point numbers, and in the case of an array, about multiple
occurrences.

Getting back to scoping and the validation of a global decl/def.... Upshot
of all this: it means from the perspective of "validating" a global
decl/def,  one can't have conflicting DFDL properties in a global type or
element declaration, but properties can be unspecified/unstated also, to be
provided by the way that global decl/def is used.

If a top-level element declaration is incomplete in this style, then it is
unsuitable for use as the document element of a data file/stream unless
augmented by external information - something possible and which we discuss
in chapter 21 (version 035) of the spec without giving specific mechanism.

If a top-level element declaration is incomplete in this style, then it can
be made complete by way of being used by reference from another point in the
schema which surrounds it with a scope providing the needed properties, or
which provides the needed properties directly at the point of reference.
This preserves referential transparency, and makes the semantics of
referential transparency be just plain textual substitution, which is the
semantics in XML Schema in general.

I believe total validity (Consistency AND completeness) for global
decls/defs is not worth trying to achieve for the sake of a tooling goal.
Tooling may have to be more sophisticated, but discarding referential
transparency is not something we should do for the sake of simplifying some
goal for tooling that isn't even clearly a requirement.

A tooling "goal" might be to allow an interactive user to point at a schema
anywhere and see a list of properties in effect at that point. Total
validity (consistency and completeness) is required for a concrete answer to
this. However, why do we think this tooling goal should be a requirement?
The answer presented back to the user could be that some properties are
"unspecified", while other properties have specific values. I don't see this
as problematic.

We carefully decided not to allow any lexical invocation of DFDL formats at
top level in order to eliminate the issue of lexical closure for top level
objects. This allows ordinary textual referential integrity to work. I.e.,
reference semantics is exactly that of textual substitution. This is very
desirable, as it allows ordinary refactoring of DFDL schemas to share common
decls/defs to work in the expected manner.

To me this is very desirable, and is a primary composition principle which
will allow creation of complex schemas from simpler parts.
*


On Mon, Sep 7, 2009 at 11:55 AM, Alan Powell <alan_powell at uk.ibm.com> wrote:

>
> All
>
> Attached is the description of the new DFDL scoping rules.
>
> We did not discuss the rules for simpleType derivations so I have assumed
> that it uses the same rules as simpleType reference, namely that the
> properties are merged and there must not be any duplicate properties
> specified.
>
> I have removed most of the complicated examples as they no longer apply.
>
>
>
> Alan Powell
>
> MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
> Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com
> Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090929/21615c7c/attachment.html 


More information about the dfdl-wg mailing list