Unix more / better UTF-8

Zenaan Harkness zen at freedbms.net
Tue Mar 8 18:37:42 PST 2016


For those who have -not- yet read the canonical UTF-8 advocacy blog
post from many years ago, and for whom doing so is relevant, it really
is a must read:
http://utf8everywhere.org/


On 3/9/16, grarpamp <grarpamp at gmail.com> wrote:
> http://www.dragonflybsd.org/release44/

When I read something like:
"We introduced "short codes", so now codes like "de_DE", "fr_FR",
"en_US", "el_GR", etc. These short-codes are generally mapped to 8-bit
character sets such ia ISO-8859-x, but sometimes they are mapped to
UTF-8 if the traditional single-byte encoding doesn't adequately cover
the locale anymore (e.g. the currency is not supported)."

I think "people still haven't cottoned on - UTF-8 should be the
default, and only vary if really necessary. Now I must qualify this
statement, since I don't know BSD, nor much about locales. Debian is
my friend.


> https://news.ycombinator.com/item?id=11248847

"Xterm(1) now UTF-8 by default on OpenBSD"  - great news! Better late
than never...


> https://wiki.gentoo.org/wiki/UTF-8/it
>
> philes... not just high ASCII anymore...

Wonders never cease.


Now, if only Java could properly handle Unicode characters and had a
string class which could properly work with UTF-8:
https://zenaan.github.io/zen/javadoc/zen/lang/string.html

Note1: Motivated by my extreme frustration with Java's Unicode
limitations to the point of not even being able to implement a proper
string formatter, by the utf8everywhere.org website, and by having
quite some days in a row to figure out why the problem existed in the
first place and exactly what -is- Java's problem in this particular
regard.

Note2: The documentation at the top of this link is the relevant part,
the class is just a note pad...

Note3: I have a pretty solid CodePointCursor.java class (yet to be
uploaded), well tested by a uint and tagged string CodePointParser, if
anyone actually wants to finish a proper Java string class such as
above...



More information about the cypherpunks mailing list