Re: The second operating system hiding in every mobile phone

13 Nov 2013

      Reminded me of a good old article…

http://blog.mecheye.net/2012/12/bytecode/

Bytecode
Posted on December 9, 2012	

What is the most commonly used bytecode language in the world? Java
(JVM Bytecode)? .NET (CLI)? Flash (AVM1/AVM2)? Nope. There’s a few that
you use every day, simply by turning on your computer, or tablet, or
even phone. You don’t even have to start an application or visit a
webpage.

ACPI

The most obvious is the large, gargantuan specification known as
“ACPI”. The “Advanced Configuration and Power Interface” specification
lives up to its name, with the most recent specification being a
mammoth document that weighs in at almost 1000 pages. And yes,
operating systems are expected to implement this. The entire thing. The
bytecode part is hidden deep, but it’s seen in chapter 20, under “ACPI
Machine Language”, describing a semi-register VM with all the usuals:
Add, Subtract, Multiply, Divide, standard inequalities and equalities,
but then throws in other fun things like ToHexString and Mid
(substring). Look even further and you’ll see a full object model,
system properties, as well as an asynchronous signal mechanism so that
devices are notified about when those system properties change.

Most devices, of course, have a requirement of nothing less than a full
implementation of ACPI, so of course all this code is implemented in
your kernel, running at early boot. It parallels the complexity of a
full JavaScript environment with its type system and system bindings,
with the program code supplied directly over the wire from any device
you plug in. Because the specification is so complex, an OS-independent
reference implementation was created by Intel, and this is the
implementation that’s used in the Linux kernel, the BSDs (including Mac
OS X), and the fun toy ReactOS, HaikuOS kernels. I don’t know if it’s
used by Windows or not. Since the specification’s got Microsoft’s name
on it, I assume their implementation was created long before ACPICA.

Fonts

After that, want to have a graphical boot loader? Simply rendering an
OpenType font (well, only OpenType fonts with CFF glyphs, but the
complexities of the OpenType font format is a subject for another day)
requires parsing the Type 2 Glyph Format, which indeed involves a
custom bytecode format to establish glyphs. This one’s even more
interesting: it’s a real stack-based interpreter, and it even has a
“random” opcode to make random glyphs at runtime. I can’t imagine this
ever be useful, but it’s there, and it’s implemented by FreeType, so I
can only assume it’s used by some fonts from in the real world. This
bytecode interpreter also contained at one time a stack overflow
vulnerability which was what jailbroke the iPhone in JailbreakMe.com
v2.0, with the OTF file being loaded by Apple’s custom PDF viewer.

This glyph language is based on and is a stripped down version of
PostScript. Actual PostScript involves a full turing-complete
register/stack-based hybrid virtual machine based on Forth. The
drawbacks of this system (looping forever, interpreting the entire
script to draw a specific page because of complex state) were the major
motivations for the PDF format — while based on PostScript, it doesn’t
have much shared document state, and doesn’t allow any arbitrary flow
control operations. In this model, someone (even an automated program)
could easily verify that a graphic was encapsulated, not doing
different things depending on input, and that it terminated at some
point.

And, of course, since fonts are complicated, and OpenType is
complicated, OpenType also includes all of TrueType, which includes a
bytecode-based hinting model to ensure that your fonts look correct at
all resolutions. I won’t ramble on about it, but here’s the FreeType
implementation. I don’t know of anything interesting happening to this.
Seems there was a CVE for it at one time.

To get this article to display on screen, it’s very likely that
thousands of these tiny little microprograms ran, once for each glyph
shape in each font.

Packet filtering

Further on, if you want to capture a network packet with tcpdump or
libpcap (or one of its users like Wireshark), it’s being filtered
through the Berkeley Packet Filter, a custom register-based bytecode.
The performance impact of this at one time was too large for people
debugging network issues, so a simple JIT compiler was put into the
kernel, under an experimental sysctl flag.

As a piece of historical interest, an earlier version of the BPF code
was part of the code claimed to be infringing part of the SCO lawsuits
(page 15), but was actually part of BSD4.3 code that was copied to the
Linux kernel. The original BSD code was eventually replaced with the
current architecture, known as the Linux Socket Filter, in Linux 2.2
(which I can’t easily link to, as there’s no public repository of the
Linux kernel code with pre-git history, as far as I know).

What about it?

The popularity of bytecode as a general and flexible solution to
problems is alluring, but it’s not without its complexities and faults,
with such major security implications (an entire iPhone jailbreak from
incorrect stack overflow checking!) and insane implementation
requirements (so much that we only have one major implementation of
ACPI used across all OSes that we can check).

The four examples also bring out something interesting: the wildly
different approaches that can be taken to a bytecode language. In the
case of ACPI, it’s an interesting take on what I can only imagine is
scope creep on an originally declarative table specification, bringing
it to the mess today. The Type 1 Glyph and TrueType Hinting languages
are basic stack-based interpreters, showing their PostScript heritage.
And BPF is a register-based interpreter, which ends up with a
relatively odd register-based language that can really only do simple
operations.

Note, though, that all of these implementations above have had security
issues in their implementations, with numerous CVEs for each one,
because bytecode interpreter implementations are hard to get right. So,
to other hackers: do you know of any other low-level, esoteric custom
bytecode specifications like these? And to spec writers: did you really
need that flexibility?

Re: The second operating system hiding in every mobile phone

Alexey Zakhlestin