[crazy][hobby][spam] Automated Reverse Engineering

Tue Feb 1 08:24:03 PST 2022

- a large pretrained model that has significant understanding of
english logic and knowledge could be finetuned on bytes by training
perceiver-like cross attention embedding/tokenization encoders and
decoders to match the behaviors if its original tokenizer and
embeddings but accept byte streams.
-    the perceiver masked lm model uses cross attention as such:
PerceiverLayer(config, is_cross_attention=True, qk_channels,
v_channels, num_heads, q_dim, kv_dim,
widening_factor=config.widening_factor,
use_query_residual=config.use_query_residual) and calls it as:
cross_attention(trained_embeddings, attention_mask=None,
head_mask=None, inputs=inputs, inputs_mask=inputs_mask), roughly.

- I'm curious what kind of memory and computation bounds there are on
data input size for the mainstream trained models. Could we feed an
entire binary in? Could we feed an entire tarball in?

- I'm curious what the state of the large-input models is, like
bigbird. Are they helpful here?

- I'd also like to run the current model on colab by finding a
workable trick to prevent the compilation crash, by either compiling
in smaller chunks or using a different framework, possibly without
compilation.