[DFDL-WG] New scoping rules
Alan Powell
alan_powell at uk.ibm.com
Tue Sep 29 12:00:45 CDT 2009
Mike
I agree with you that the new scoping rules are not workable.
I also agree with you that 'all the required dfdl properties' does not
mean 'all the dfdl properties'. Unfortunately because the only way to turn
some properties off is to set them to the empty string you require a lot
more properties that you might expect. 'initiator', 'terminator',
inputValuCalc, 'outputValueCalc, etc, etc all fall into this category so
must be set for every element.
While it may be acceptable to say that global components don't have to be
complete it must be possible to verify that a schema definition is
complete and correct so are we back to designating the starting points?
We have discussed scoping a lot without finding an ideal solution so I am
beginning to wonder if we should give up and exclude it from DFDL V1.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell at uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
From:
Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:
Alan Powell/UK/IBM at IBMGB
Cc:
dfdl-wg at ogf.org
Date:
29/09/2009 13:52
Subject:
Re: [DFDL-WG] New scoping rules
Alan,
I've done some thinking on the scoping, and I think we've talked ourselves
into a bad position.
>From the note on scoping:
The proposal currently under consideration is:
Schema objects inherit DFDL properties from a lexically enclosing
xs:complexType or xs:group declaration
DFDL properties on a referenced global schema object (except simpleTypes)
cannot be overridden unless explicitly parameterized by the global object.
The above is problematic. This breaks referential transparency.
DFDL properties explicity defined on an element and it's referenced
simpeType are merged into a single set. It is an error is the same
property is defined on both the element and simpleType.
It must be possible to validate all global objects , except simpleTypes.
This last bullet is an unreasonable requirement, depending on how you
define validity. This was put in to simplify a tooling requirement of some
sort that I believe is likely not a good goal for us to accept.
Validity can mean "is consistent", but should not require property
specifications to be "complete".
This is an area of some confusion in DFDL. We have stated that a schema
must have "all required properties" specified, and that there is no
defaulting of property values by implementations. The purpose of this is
to avoid implementation-specific or platform specific assumptions from
creeping in so that DFDL schemas are more likely to be portable. This
statement has been misinterpreted in the following sense. Some have
interpreted this as meaning that all properties that are defined in the
DFDL spec must have values set in order for a schema to be "valid". But
when stating the "all required properties" rule (largely at my
insistance), this was definitely not my intention. Consider for example if
a format is all text, and uses a single-byte character set encoding, then
I claim that dfdl:byteOrder need not be specified as it will never be
needed to interpret the data. The point of saying there are no defaults
for property values is NOT to require dfdl:byteOrder to always be
specified, it is to say that if the format requires dfdl:byteOrder -
because it has binary multi-byte representations in it, or wide characters
which have endianness, then dfdl:byteOrder must be specified by the
schema, either directly, by an included schema referenced by the schema,
or must be specified explicitly via some external mechanism - section 21
of draft 035. The point is that the implementation cannot just say "there
is an unstated default" in this implementation for dfdl:byteOrder based on
the platform you are installed on. If an implementation were to do that,
then the schemas usable with that implementation will not be portable for
use with other implementations - something we are trying to avoid.
The difference here is subtle but important. Section 22 of draft 035 is a
place holder for some pre-defined include-files the inclusion of which
will provide dfdl:defineFormat specifications for useful sets of
properties. It is important for everyone to understand that including
these in a DFDL schema is 100% optional, and is for convenience of
obtaining consistent and meaningful sets of properties only. However,
simple formats can be described without any inclusion of these at all. As
another example: if a file contains only an array of binary floating point
numbers, then no dfdl:encoding property is needed. Just a handful of
properties are needed to parse/unparse such a file format, and those are
the ones about binary floating point numbers, and in the case of an array,
about multiple occurrences.
Getting back to scoping and the validation of a global decl/def.... Upshot
of all this: it means from the perspective of "validating" a global
decl/def, one can't have conflicting DFDL properties in a global type or
element declaration, but properties can be unspecified/unstated also, to
be provided by the way that global decl/def is used.
If a top-level element declaration is incomplete in this style, then it is
unsuitable for use as the document element of a data file/stream unless
augmented by external information - something possible and which we
discuss in chapter 21 (version 035) of the spec without giving specific
mechanism.
If a top-level element declaration is incomplete in this style, then it
can be made complete by way of being used by reference from another point
in the schema which surrounds it with a scope providing the needed
properties, or which provides the needed properties directly at the point
of reference. This preserves referential transparency, and makes the
semantics of referential transparency be just plain textual substitution,
which is the semantics in XML Schema in general.
I believe total validity (Consistency AND completeness) for global
decls/defs is not worth trying to achieve for the sake of a tooling goal.
Tooling may have to be more sophisticated, but discarding referential
transparency is not something we should do for the sake of simplifying
some goal for tooling that isn't even clearly a requirement.
A tooling "goal" might be to allow an interactive user to point at a
schema anywhere and see a list of properties in effect at that point.
Total validity (consistency and completeness) is required for a concrete
answer to this. However, why do we think this tooling goal should be a
requirement? The answer presented back to the user could be that some
properties are "unspecified", while other properties have specific values.
I don't see this as problematic.
We carefully decided not to allow any lexical invocation of DFDL formats
at top level in order to eliminate the issue of lexical closure for top
level objects. This allows ordinary textual referential integrity to work.
I.e., reference semantics is exactly that of textual substitution. This is
very desirable, as it allows ordinary refactoring of DFDL schemas to share
common decls/defs to work in the expected manner.
To me this is very desirable, and is a primary composition principle which
will allow creation of complex schemas from simpler parts.
On Mon, Sep 7, 2009 at 11:55 AM, Alan Powell <alan_powell at uk.ibm.com>
wrote:
All
Attached is the description of the new DFDL scoping rules.
We did not discuss the rules for simpleType derivations so I have assumed
that it uses the same rules as simpleType reference, namely that the
properties are merged and there must not be any duplicate properties
specified.
I have removed most of the complicated examples as they no longer apply.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell at uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090929/6558e627/attachment.html
More information about the dfdl-wg
mailing list