[DFDL-WG] Clarification on UTF-16 and UTF-32 encoding byte order
Mike Beckerle
mbeckerle.dfdl at gmail.com
Thu Apr 23 15:22:59 EDT 2020
Since we dropped the Unicode byte order mark functionality from DFDL v1.0,
the issue arises of what byte order is used when dfdl:encoding="utf-16" or
dfdl:encoding="utf-32".
We are clear that encodings define their own byte and bit order, the
dfdl:byteOrder property is not used.
There are these options:
1) explicitly disallow these encoding names because they do not specify a
byte order. Require utf-16BE or utf-16LE, utf-32BE or utf-32LE.
2) specify that these are synonyms for the BE versions
3) specify that these are synonyms for the LE versions
This comes up in the definition of the dfdl:byteOrder property where the
text currently says:
This property is never used to establish the byte order for text /strings
with Unicode fixed-width encodings that do not specify the byte order
(UTF-16 and UTF-32).
Having removed the unicode byte order mark feature, this statement leaves
us without a stipulation of how UTF-16 and UTF-32 byte order would be
determined.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200423/c823799c/attachment.html>
More information about the dfdl-wg
mailing list