[DFDL-WG] Action 059: External specification of encoding, byte order
Steve Hanson
smh at uk.ibm.com
Thu Nov 12 05:07:23 CST 2009
As discussed on the call:
For case 1) the DFDL xsd always wins, and the context is ignored. If the
user wants to use the encoding/byte order from the context, then he must
be explicit about this and use case 2) above
Will adopt suggestion a). One question - are there any other DFDL
properties like dfdl:encoding and dfdl:byteOrder that are commonly
provided by context? How about dfdl:binaryFloatRepresentation, or
dfdl:outputNewLine?
Will not adopt suggestion b).
Regards
Steve Hanson
Programming Model Architect, WebSphere Message Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848
From:
Steve Hanson/UK/IBM
To:
dfdl-wg at ogf.org
Date:
05/11/2009 14:30
Subject:
Action 059: External specification of encoding, byte order
DFDL schemas can either:
1) specify fixed encoding(s)/byte order(s) for the data being described,
2) specify that the encoding/byte order is provided by the 'context' that
invokes the DFDL processor (using the dfdl:defineVariable 'external'
facility). **
For case 1), DFDL is faced with a problem. Namely what happens when the
'context' provides an encoding/byte order for the data, but the DFDL xsd
specifies a different encoding/byte order. I think DFDL must make a
statement about this situation, as there are several common scenarios
where this could occur (HTTP, MIME, MQ).
It is worth looking at the precedent set by XML in this regards. The
analogous problem for XML is where the XML document itself specifies a
different encoding (using the ?xml declaration) to the context. The
recommendations for XML are stated in the appendix below - there is no
universal rule.
It is more complicated with DFDL though. A DFDL xsd can set up the
encoding(s)/byte order(s) to use in several different places. Which of
those would the context override? All of them? Just the one associated
with the top-level structure?
My conclusion is therefore that for case 1) the DFDL xsd always wins, and
the context is ignored. If the user wants to use the encoding/byte order
from the context, then he must be explicit about this and use case 2)
above.
There are two things that we could allow to be a bit more flexible:
a) Pre-define $encoding and $byteOrder variables in the DFDL namespace.
These would implictly have 'external' = 'true' and perhaps a
'defaultValue' as well. This simplifies the coding of a DFDL xsd for case
2).
b) State that it is an implementation decision to provide an option to use
a context encoding/byte order for case 1) instead of the ones in the DFDL
xsd. In such a case, the context MUST override all encodings/byte orders
in the system of xsds used by the DFDL processor. (In practice this is
invariably a single encoding/byte order). .
** (Might be more than encoding & byte order - for example MQ also allows
float format to be provided by context)
Appendix: XML
The equivalent situation for XML is where the XML document specifies its
own encoding via the ?xml declaration, and the context also provides the
encoding. There is no single rule, in summary:
- Basicaly if there is a higher level protocol, then that defines
the rules.
- Eg, for MIME content-type text/xml, the context encoding is
used. If this is omitted, the xml is assumed to be US-ASCII. The ?xml
declaration encoding is not used.
- Eg, for MIME content-type application/xml, the context encoding
is used If this is omitted, the ?xml declaration encoding is used.
- For files (where there is no context encoding) use of the ?xml
declaration encoding is recommended.
Note that in Message Broker, we always use the context encoding, as it
should always be present. We never use the ?xml declaration.
W3C XML 1.0 spec section F.2 Priorities in the Presence of External
Encoding Information
The second possible case occurs when the XML entity is accompanied by
encoding information, as in some file systems and some network protocols.
When multiple sources of information are available, their relative
priority and the preferred method of handling conflict should be specified
as part of the higher-level protocol used to deliver XML. In particular,
please refer to [IETF RFC 3023] or its successor, which defines the
text/xml and application/xml MIME types and provides some useful guidance.
In the interests of interoperability, however, the following rule is
recommended.
If an XML entity is in a file, the Byte-Order Mark and encoding
declaration are used (if present) to determine the character encoding.
IETF RFC 3023
3.6 Summary
The following list applies to text/xml, text/xml-external-parsed-
entity, and XML-based media types under the top-level type "text"
that define the charset parameter according to this specification:
o Charset parameter is strongly recommended.
o If the charset parameter is not specified, the default is "us-
ascii". The default of "iso-8859-1" in HTTP is explicitly
overridden.
o No error handling provisions.
o An encoding declaration, if present, is irrelevant, but when
saving a received resource as a file, the correct encoding
declaration SHOULD be inserted.
The next list applies to application/xml, application/xml-external-
parsed-entity, application/xml-dtd, and XML-based media types under
top-level types other than "text" that define the charset parameter
according to this specification:
o Charset parameter is strongly recommended, and if present, it
takes precedence.
o If the charset parameter is omitted, conforming XML processors
MUST follow the requirements in section 4.3.3 of [XML].
Regards
Steve Hanson
Programming Model Architect, WebSphere Message Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091112/bcaf418f/attachment.html
More information about the dfdl-wg
mailing list