[spam?] Note: Automated Legal Reverse Engineering

Sun Nov 7 05:11:55 PST 2021

Public Advisory

If you're not aware, seq2seq transformer model technology has been
sufficiently advanced enough to be trained to automatically convert
binary code (including firmware) into source code for some time now,
optionally clear and commented etc.  I don't know whether anybody has
published work doing that, it's probably somewhere, but it's pretty
clear it wouldn't be hard to do.

The advantage of the transformer model approach is that in the USA,
there's a law against reverse engineering and then producing a product
based on the reversed code without having a second layer in the
analysis process, where a spec is produced from the reversing work,
and then new code is written from the spec.

This just means training more models to translate into and out of
interface specs.  For a transformer model, a spec is just a simple
code language that leaves a lot of detail out.

This training could be automated to not require much data.

This is one of the many things that effort developing and normalising
would result in incredible freed hours for community software
developers who would no longer have to keep reinventing the wheel to
maintain compatibility with hardware.  It would also mean libre
licensed drivers could possibly be made for every binary blob.

It'll get a little harder as corps start preparing for this,
encrypting their firmware and such, which I suppose would mean the
models would need a channel to the running behavior of the firmware to
reverse it, and then that will get harder to access, etc etc, so maybe
if pursuing this, implement it kind of gently and respectfully somehow
to not produce fear in people's profit lines.