[DFDL-WG] Clarification on UTF-16 and UTF-32 encoding byte order

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Apr 23 15:22:59 EDT 2020


Since we dropped the Unicode byte order mark functionality from DFDL v1.0,
the issue arises of what byte order is used when dfdl:encoding="utf-16" or
dfdl:encoding="utf-32".

We are clear that encodings define their own byte and bit order, the
dfdl:byteOrder property is not used.

There are these options:
1) explicitly disallow these encoding names because they do not specify a
byte order. Require utf-16BE or utf-16LE, utf-32BE or utf-32LE.
2) specify that these are synonyms for the BE versions
3) specify that these are synonyms for the LE versions

This comes up in the definition of the dfdl:byteOrder property where the
text currently says:

This property is never used to establish the byte order for text /strings
with Unicode fixed-width encodings that do not specify the byte order
(UTF-16 and UTF-32).

Having removed the unicode byte order mark feature, this statement leaves
us without a stipulation of how UTF-16 and UTF-32 byte order would be
determined.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200423/c823799c/attachment.html>


More information about the dfdl-wg mailing list