On Thu, Jul 1, 2021, 4:16 AM Karl <gmkarl@gmail.com> wrote:
...
I proposed to the lsl project (used for neuroscience research) that they encrypt and authenticate their biosignal streams.  I wasn't sure what system to suggest and suggested hypercore because it offers some small proof of creation after the fact  They were expecting TLS of course, which I worry around because it doesn't say anything about archival integrity after decryption.  Hypercore wasn't really a good suggestion because it is written in nodejs and lsl is in c++ :-/

Seems go and rust are the future.  I looked up go.sum : dependencies, although retrieved from github over the network (scary way to make an ecosystem) are hashed via sha256 in a way that can be upgraded (reliable, trustworthy).  Inspiring.  There are multiple facilities in the go dependency system, for pulling from offline mirrors instead of github, but they aren't that easy to find.  Haven't checked if the commit id of dependencies is used in the hash, or the worktree checkout, or what.

...
Rust stores its cargo.io package index in a single git repository with history.  Each package's source bundle is hashed with sha256, although it does not look like the format provides for easily upgrading that algorithm.
..
go also uses a module mirror and checksum database.  https://proxy.golang.org/ .  An interesting technology is mentioned called "certificate transparency" and "transparent log" : it says the server's integrity is not trusted.  It sounds really interesting.  automatic use of the checksum database, which appears spread under subfolders of https://sum.golang.org/, is only enabled starting with go 1.13 .

The mitm-contents of https://sum.golang.org/latest for me right now are roughly this:

go.sum database tree
5846179
ynvWHhPdVJ+uzW3tYDxuPyccZN0KmsJKmy/x6aSglq4=

sum.golang.org Az3grhYllN53hh2b10cHJvRkyLB/pGehUuEZj5QeNKNHlkqhFwt2zXNgZcK3XuUisNaWOG/GD992XmPCyfPR/4n7cQ0=

I don't immediately see a way to mirror the checksum log, which is saddening, but the go ecosystem is pretty big so it's highly likely somebody has written code to do that.

Certificate Transparency is a great google project providing for a degree of public auditing of CA activity.  It uses an append-only merkle tree.  The tooling appears pretty complicated and mostly driven by google's go implementations.  It's not a lightweight small tool like pgp or git.  Go's sumdb uses Trillian, which is a generalisation of the technology behind certificate transparency.  An important question is whether there are alternative implementations of trillian.

The mitm-commit-tip of https://github.com/google/trillian-examples for me is 267fb50f0b5571b879ac75fd52a113af1b31c6a0 .  In the sumdbaudit/ folder is software in go for producing, auditing, and running a go sumdb mirror.

# Auditor / Cloner for SumDB

This directory contains tools for verifiably creating a local copy of the
[Go SumDB](https://blog.golang.org/module-mirror-launch) into a local SQLite
database.
 * `cli/clone` is a one-shot tool to clone the Log at its current size
 * `cli/mirror` is a service which continually clones the Log
 * `cli/witness` is an HTTP service that uses a local clone of the Log to provide
   checkpoint validation for other clients. This is a very lightweight way of
   providing some Gossip solution to detect split views.

## Background
This is a quick summary of https://blog.golang.org/module-mirror-launch but is not
intended to replace this introduction.
If you have no context on Go SumDB, read that intro first :-)

Go SumDB is a Verifiable Log based on Trillian, which contains entries of the form:
```
github.com/google/trillian v1.3.11 h1:pPzJPkK06mvXId1LHEAJxIegGgHzzp/FUnycPYfoCMI=
github.com/google/trillian v1.3.11/go.mod h1:0tPraVHrSDkA3BO6vKX67zgLXs6SsOAbHEivX+9mPgw=
```
Every module & version used in the Go ecosystem will have such an entry in this log,
and the values are hashes which commit to the state of the repository and its `go.mod`
file at that particular version.

Clients can be assured that they have downloaded the same version of a module as
everybody else provided all of the following are true:
 * The hash of what they have downloaded matches an entry in the SumDB Log
 * There is only one entry in the Log for the `module@version`
 * Entries in the Log are immutable / the Log is append-only
 * Everyone else sees the same Log

## Features
This auditor provides an example for how Log data can be verifiably cloned, and
demonstrates how this can be used as a basis to verify its
[Claims](https://github.com/google/trillian/blob/master/docs/claimantmodel/).
The Claims checked by this clone & audit tool are:
 * SumDB Checkpoints/STHs properly commit to all of the data in the Log
 * Committed entries are never modified; the Log is append-only
 * Each `module@version` appears at most once

In addition to verifying the above Claims, the tool populates a SQLite database
with the following tables:
 * `leaves`: raw entries from the Log
 * `tiles`: tiled subtrees of the Merkle Tree
 * `checkpoints`: a history of Log Checkpoints (aka STHs) that have been seen
 * `leafMetadata`: parsed data from the `leaves` table

This tool does **not** check any of the following:
 * Everyone else sees the same Log: this requires some kind of Gossip protocol for
   clients and verifiers to share Checkpoints
 * That the hashes in the log represent the current state of the repository
   (the repository could have changed its git tags such that the hashes no longer
   match, but this is not verified)
 * That any `module@version` is "safe" (i.e. no checking for CVEs, etc)

## Running `clone`

The following command will download all entries and store them in the database
file provided:
```bash
go run github.com/google/trillian-examples/sumdbaudit/cli/clone -sqlite_file ~/sum.db -alsologtostderr -v=2
```
This will take some time to complete on the first run. Latency and bandwidth
between the auditor and SumDB will be a large factor, but for illustrative
purposes this completes in around 4 minutes on a workstation with a good wired
connection, and in around 10 minutes on a Raspberry Pi connected over WiFi.
Your mileage may vary. At the time of this commit, SumDB contained a little over
1.5M entries which results in a SQLite file of around 650MB.

## Setting up a `mirror` service
These instructions show how to set up a mirror service to run on a Raspberry Pi
running a recent version of Raspbian.

> :frog: this would be more useful with a client/server database instead of sqlite!

Setup:
```bash
# Build the mirror and install it where it can be executed
go build ./sumdbaudit/cli/mirror
sudo mv mirror /usr/local/bin/sumdbmirror

# Create a user to run the service that has no login
sudo useradd -M sumdb
sudo usermod -L -s /bin/false sumdb

# Create a directory to store the sqlite database
sudo mkdir /var/cache/sumdb
sudo chown sumdb.sumdb /var/cache/sumdb
```
Define the service by creating the file `/etc/systemd/system/sumdbmirror.service` with contents:
```
[Unit]
Description=Go SumDB Mirror
After=network.target
[Service]
Type=simple
User=sumdb
ExecStart=/usr/local/bin/sumdbmirror -sqlite_file /var/cache/sumdb/mirror.db -alsologtostderr -v=1

[Install]
WantedBy=multi-user.target
```

Start the service and check its progress:
```bash
sudo systemctl daemon-reload
sudo systemctl start sumdbmirror

# Follow the latest log messages
journalctl -u sumdbmirror -f
```

When the mirror service is sleeping, you will be able to query the local database at
`/var/cache/sumdb/mirror.db` using the example queries in the next section.
At the time of writing this setup uses almost 600MB of storage for the database.

If you want to have the `leafMetadata` table populated then you can add an extra argument
to the service definition.
In the `ExecStart` line above, add `-unpack` and then restart the `sumdbmirror` service
(`sudo systemctl daemon-reload && sudo systemctl restart sumdbmirror`).
When it next updates tiles this table will be populated.
This will use more CPU and around 60% more disk.

## Setting up a `witness` service
This requires a local clone of the SumDB Log to be available. For this to be of any
real value, it should be running against a database which is regularly being updated
by the `mirror` service described above.

> :warning: The witness is missing features (outlined below) in order to be used in an
> untrusted environment. This witness implementation is useful only in a trusted domain
> where the correct operation of the witness is implicit. This precludes being run as
> a general service on the Web, but is still useful within a household or organization.

A client which successfully checks its checkpoints with a witness can ensure that if
there is a "split view" of the SumDB Log, then it is on the same side of the split as
the witness. If this witness is also verifying the claims of the log, then the client
is safe in relying on the data within (providing it trusts the verifer!).

The service can be started with the command (assuming `~/sum.db` is the database):
```bash
go run ./sumdbaudit/cli/witness -listen :8080 -sqlite_file ~/sum.db -v=1 -alsologtostderr
```

This can be set up as a Linux service in much the same way as the `mirror` above.

Once running, the server will be available for GET requests at the listen address
given as a commandline parameter.

Some example requests that can be made:
```bash
# Simply get the latest golden checkpoint
curl -i http://localhost:8080/golden

# Validate that the witness is consistent with the Checkpoint your go build tools are using
curl -i http://localhost:8080/checkConsistency/`base64 -i ~/go/pkg/sumdb/sum.golang.org/latest`

# Validate that the witness is consistent with the latest Checkpoint from the real Log
curl -i http://localhost:8080/checkConsistency/`curl https://sum.golang.org/latest | base64`
```

### Using Docker

The witness can be started along with the mirror using `docker-compose`.
The following command will mirror the log and provide a witness on port `8080` when the initial
sync completes:
```bash
docker-compose -f sumdbaudit/docker/docker-compose.yml up -d
```

If using a Raspberry Pi, the above command will fail because no suitable MariaDB image can be
installed. Instead, use this command to install an image that works:
```bash
docker-compose -f sumdbaudit/docker/docker-compose.yml -f sumdbaudit/docker/docker-compose.rpi.yml up -d
```

### Using a client/server database

The instructions above are for setting this up using sqlite with its storage on the local filesystem.
To set this up using MariaDB, the database can be provisioned by logging into the instance as root user and running the following:

```bash
CREATE DATABASE sumdb;
CREATE USER 'sumdb'@localhost IDENTIFIED BY 'letmein';
GRANT ALL PRIVILEGES ON sumdb.* TO 'sumdb'@localhost;
FLUSH PRIVILEGES;
```

Once set up, change the `sqlite_file` flag above for `mysql_uri` with a connection string like `'sumdb:letmein@tcp(127.0.0.1:3306)/sumdb?parseTime=true'`.

## Querying the database
The number of leaves downloaded can be queried:
```bash
sqlite3 ~/sum.db 'SELECT COUNT(*) FROM leaves;'
```

And the tile hashes at different levels inspected:
```bash
sqlite3 ~/sum.db 'SELECT level, COUNT(*) FROM tiles GROUP BY level;'
```

The modules with the most versions:
```bash
sqlite3 ~/sum.db 'SELECT module, COUNT(*) cnt FROM leafMetadata GROUP BY module ORDER BY cnt DESC LIMIT 10;'
```

## Missing Features
* This only downloads complete tiles, which means that at any point there could
  be up to 2^height leaves missing from the database.
  These stragglers should be stored if the root hash checks out.
* Witness should return detailed responses
  * In the event of an inconsistency, both Checkpoints notes should be serialized and returned
  * Consistency should return a proof that the tree is consistent with the witnesses Golden Checkpoint