On Thu, Jul 1, 2021, 4:16 AM Karl <gmkarl@gmail.com> wrote: ...
I proposed to the lsl project (used for neuroscience research) that they
encrypt and authenticate their biosignal streams. I wasn't sure what system to suggest and suggested hypercore because it offers some small proof of creation after the fact They were expecting TLS of course, which I worry around because it doesn't say anything about archival integrity after decryption. Hypercore wasn't really a good suggestion because it is written in nodejs and lsl is in c++ :-/
Seems go and rust are the future. I looked up go.sum : dependencies, although retrieved from github over the network (scary way to make an ecosystem) are hashed via sha256 in a way that can be upgraded (reliable, trustworthy). Inspiring. There are multiple facilities in the go dependency system, for pulling from offline mirrors instead of github, but they aren't that easy to find. Haven't checked if the commit id of dependencies is used in the hash, or the worktree checkout, or what.
...
Rust stores its cargo.io package index in a single git repository with
history. Each package's source bundle is hashed with sha256, although it does not look like the format provides for easily upgrading that algorithm.
..
go also uses a module mirror and checksum database. https://proxy.golang.org/ . An interesting technology is mentioned called "certificate transparency" and "transparent log" : it says the server's integrity is not trusted. It sounds really interesting. automatic use of the checksum database, which appears spread under subfolders of https://sum.golang.org/, is only enabled starting with go 1.13 .
The mitm-contents of https://sum.golang.org/latest for me right now are roughly this:
go.sum database tree 5846179 ynvWHhPdVJ+uzW3tYDxuPyccZN0KmsJKmy/x6aSglq4=
— sum.golang.org Az3grhYllN53hh2b10cHJvRkyLB/pGehUuEZj5QeNKNHlkqhFwt2zXNgZcK3XuUisNaWOG/GD992XmPCyfPR/4n7cQ0=
I don't immediately see a way to mirror the checksum log, which is saddening, but the go ecosystem is pretty big so it's highly likely somebody has written code to do that.
Certificate Transparency is a great google project providing for a degree of public auditing of CA activity. It uses an append-only merkle tree. The tooling appears pretty complicated and mostly driven by google's go implementations. It's not a lightweight small tool like pgp or git. Go's sumdb uses Trillian, which is a generalisation of the technology behind certificate transparency. An important question is whether there are alternative implementations of trillian. The mitm-commit-tip of https://github.com/google/trillian-examples for me is 267fb50f0b5571b879ac75fd52a113af1b31c6a0 . In the sumdbaudit/ folder is software in go for producing, auditing, and running a go sumdb mirror. # Auditor / Cloner for SumDB This directory contains tools for verifiably creating a local copy of the [Go SumDB](https://blog.golang.org/module-mirror-launch) into a local SQLite database. * `cli/clone` is a one-shot tool to clone the Log at its current size * `cli/mirror` is a service which continually clones the Log * `cli/witness` is an HTTP service that uses a local clone of the Log to provide checkpoint validation for other clients. This is a very lightweight way of providing some Gossip solution to detect split views. ## Background This is a quick summary of https://blog.golang.org/module-mirror-launch but is not intended to replace this introduction. If you have no context on Go SumDB, read that intro first :-) Go SumDB is a Verifiable Log based on Trillian, which contains entries of the form: ``` github.com/google/trillian v1.3.11 h1:pPzJPkK06mvXId1LHEAJxIegGgHzzp/FUnycPYfoCMI= github.com/google/trillian v1.3.11/go.mod h1:0tPraVHrSDkA3BO6vKX67zgLXs6SsOAbHEivX+9mPgw= ``` Every module & version used in the Go ecosystem will have such an entry in this log, and the values are hashes which commit to the state of the repository and its `go.mod` file at that particular version. Clients can be assured that they have downloaded the same version of a module as everybody else provided all of the following are true: * The hash of what they have downloaded matches an entry in the SumDB Log * There is only one entry in the Log for the `module@version` * Entries in the Log are immutable / the Log is append-only * Everyone else sees the same Log ## Features This auditor provides an example for how Log data can be verifiably cloned, and demonstrates how this can be used as a basis to verify its [Claims](https://github.com/google/trillian/blob/master/docs/claimantmodel/ ). The Claims checked by this clone & audit tool are: * SumDB Checkpoints/STHs properly commit to all of the data in the Log * Committed entries are never modified; the Log is append-only * Each `module@version` appears at most once In addition to verifying the above Claims, the tool populates a SQLite database with the following tables: * `leaves`: raw entries from the Log * `tiles`: tiled subtrees of the Merkle Tree * `checkpoints`: a history of Log Checkpoints (aka STHs) that have been seen * `leafMetadata`: parsed data from the `leaves` table This tool does **not** check any of the following: * Everyone else sees the same Log: this requires some kind of Gossip protocol for clients and verifiers to share Checkpoints * That the hashes in the log represent the current state of the repository (the repository could have changed its git tags such that the hashes no longer match, but this is not verified) * That any `module@version` is "safe" (i.e. no checking for CVEs, etc) ## Running `clone` The following command will download all entries and store them in the database file provided: ```bash go run github.com/google/trillian-examples/sumdbaudit/cli/clone -sqlite_file ~/sum.db -alsologtostderr -v=2 ``` This will take some time to complete on the first run. Latency and bandwidth between the auditor and SumDB will be a large factor, but for illustrative purposes this completes in around 4 minutes on a workstation with a good wired connection, and in around 10 minutes on a Raspberry Pi connected over WiFi. Your mileage may vary. At the time of this commit, SumDB contained a little over 1.5M entries which results in a SQLite file of around 650MB. ## Setting up a `mirror` service These instructions show how to set up a mirror service to run on a Raspberry Pi running a recent version of Raspbian.
:frog: this would be more useful with a client/server database instead of sqlite!
:warning: The witness is missing features (outlined below) in order to be used in an untrusted environment. This witness implementation is useful only in a
Setup: ```bash # Build the mirror and install it where it can be executed go build ./sumdbaudit/cli/mirror sudo mv mirror /usr/local/bin/sumdbmirror # Create a user to run the service that has no login sudo useradd -M sumdb sudo usermod -L -s /bin/false sumdb # Create a directory to store the sqlite database sudo mkdir /var/cache/sumdb sudo chown sumdb.sumdb /var/cache/sumdb ``` Define the service by creating the file `/etc/systemd/system/sumdbmirror.service` with contents: ``` [Unit] Description=Go SumDB Mirror After=network.target [Service] Type=simple User=sumdb ExecStart=/usr/local/bin/sumdbmirror -sqlite_file /var/cache/sumdb/mirror.db -alsologtostderr -v=1 [Install] WantedBy=multi-user.target ``` Start the service and check its progress: ```bash sudo systemctl daemon-reload sudo systemctl start sumdbmirror # Follow the latest log messages journalctl -u sumdbmirror -f ``` When the mirror service is sleeping, you will be able to query the local database at `/var/cache/sumdb/mirror.db` using the example queries in the next section. At the time of writing this setup uses almost 600MB of storage for the database. If you want to have the `leafMetadata` table populated then you can add an extra argument to the service definition. In the `ExecStart` line above, add `-unpack` and then restart the `sumdbmirror` service (`sudo systemctl daemon-reload && sudo systemctl restart sumdbmirror`). When it next updates tiles this table will be populated. This will use more CPU and around 60% more disk. ## Setting up a `witness` service This requires a local clone of the SumDB Log to be available. For this to be of any real value, it should be running against a database which is regularly being updated by the `mirror` service described above. trusted domain
where the correct operation of the witness is implicit. This precludes being run as a general service on the Web, but is still useful within a household or organization.
A client which successfully checks its checkpoints with a witness can ensure that if there is a "split view" of the SumDB Log, then it is on the same side of the split as the witness. If this witness is also verifying the claims of the log, then the client is safe in relying on the data within (providing it trusts the verifer!). The service can be started with the command (assuming `~/sum.db` is the database): ```bash go run ./sumdbaudit/cli/witness -listen :8080 -sqlite_file ~/sum.db -v=1 -alsologtostderr ``` This can be set up as a Linux service in much the same way as the `mirror` above. Once running, the server will be available for GET requests at the listen address given as a commandline parameter. Some example requests that can be made: ```bash # Simply get the latest golden checkpoint curl -i http://localhost:8080/golden # Validate that the witness is consistent with the Checkpoint your go build tools are using curl -i http://localhost:8080/checkConsistency/`base64 -i ~/go/pkg/sumdb/ sum.golang.org/latest` # Validate that the witness is consistent with the latest Checkpoint from the real Log curl -i http://localhost:8080/checkConsistency/`curl https://sum.golang.org/latest | base64` ``` ### Using Docker The witness can be started along with the mirror using `docker-compose`. The following command will mirror the log and provide a witness on port `8080` when the initial sync completes: ```bash docker-compose -f sumdbaudit/docker/docker-compose.yml up -d ``` If using a Raspberry Pi, the above command will fail because no suitable MariaDB image can be installed. Instead, use this command to install an image that works: ```bash docker-compose -f sumdbaudit/docker/docker-compose.yml -f sumdbaudit/docker/docker-compose.rpi.yml up -d ``` ### Using a client/server database The instructions above are for setting this up using sqlite with its storage on the local filesystem. To set this up using MariaDB, the database can be provisioned by logging into the instance as root user and running the following: ```bash CREATE DATABASE sumdb; CREATE USER 'sumdb'@localhost IDENTIFIED BY 'letmein'; GRANT ALL PRIVILEGES ON sumdb.* TO 'sumdb'@localhost; FLUSH PRIVILEGES; ``` Once set up, change the `sqlite_file` flag above for `mysql_uri` with a connection string like `'sumdb:letmein@tcp(127.0.0.1:3306 )/sumdb?parseTime=true'`. ## Querying the database The number of leaves downloaded can be queried: ```bash sqlite3 ~/sum.db 'SELECT COUNT(*) FROM leaves;' ``` And the tile hashes at different levels inspected: ```bash sqlite3 ~/sum.db 'SELECT level, COUNT(*) FROM tiles GROUP BY level;' ``` The modules with the most versions: ```bash sqlite3 ~/sum.db 'SELECT module, COUNT(*) cnt FROM leafMetadata GROUP BY module ORDER BY cnt DESC LIMIT 10;' ``` ## Missing Features * This only downloads complete tiles, which means that at any point there could be up to 2^height leaves missing from the database. These stragglers should be stored if the root hash checks out. * Witness should return detailed responses * In the event of an inconsistency, both Checkpoints notes should be serialized and returned * Consistency should return a proof that the tree is consistent with the witnesses Golden Checkpoint