0756
Regarding backing up, it will be slow to upload data from a home network to filecoin. I'm thinking on setting up some way of securing it with a hash before it uploads. I wonder if there existing tools to make a merkle tree out of a binary blob.
I think I will make my own, or use libgame's.
I'm thinking of streaming data from a disk, hashing every 32GiB as they come in. This could be done with a script.
We'd have a format for the hash output. It could either be a single updating document, or a tree.
This would be a hashed but unsigned document. But maybe I could sign it anyway so as to quickly verify it with familiar tools.
Alternatively, the various sum tools have a verify mode. They use a detached document.
The goal is to reduce the time window during which viruses can mutate data.
I may simply be paranoid and confused. In fact, I am usually these two things.
0801
Thinking on a chunk of data coming in. Hash the data and give it a name.
Datasource-offset-size-date
That seems reasonable. Then we can make verification files of those files, I suppose ...
0803 ...
Datasource-index-size-date
Seems more clear and useful.
This is similar to git-annex ...
0804 ...
0806
Here's a quick way to make a secure document hash:
for sum in /usr/bin/*sum; do $sum --tag $document; done 2> /dev/null
Now, considering filecoin, it would be nice if we also had the deal information.
The fields of interest include:
- numeric deal id
- payload cid
- piece cid?
- miner address?
- sender address?
- deal cid? hard to find or use this with the interface, seems most relevant
- block hash?
Deal information can be found with `lotus client get-deal`
0810
0811
It seems reasonable to provide the DealID and Proposal Label from client get-deal.
So. We'll get say a new block of data or something.
If the data is in small blocks, we would want to add this information to some hash file. Say the DealID.
I guess the deal CID would make more sense in a hash file =S maybe I can ease myself by using some xxxx looking label. The ProposalCid seems to be the same as the deal CID, unsure.
I could also put a deal id into the filename, but then the old hashes don't immediately verify.
Thinking a little of a document containing nested documents. It could hash itself, and append further data, and append a hash of above data ... then we could add deal ids farther down and still have the same canonical previous material.
0817 .
0818
I looked into using gpg --clearsign, but I don't like how the encoding of the pgp data makes it hard to debug corruption. The hash is not readily visible.
I'm considering ...
0820, inhibition
Considering normal tools for breaking a file based on boundaries. Nested text files.
I'm wanting it to be easy to break out the right parts of the files to verify things, when I am confused. I'd like to be able to verify only some of the files and hashes, when I am on a system with limited tools. This means extracting the document's own verification document out, to verify it with.
Now thinking it is overcomplicated.
Basic goal: easily check that the data I uploaded is the same as what I read from the drive, at the time I read it. So one hash, and then a way to check that hash. We download data from the network, maybe years from now, and check the hash.
I guess it's an okay solution.
Considering using a directory of files.
Then as more documents are added, a hash document can extend in length, or a tarball can be used.
Considering hashing only certain lines of files. Some way to generate subfiles from longer files. Generate all the subfiles, and you can check everything.
We could have a filename part that indicates it is lines from another file. Or greps, even.
0826
While I'm making this confused writing, my system is generate 64GiB of random data to check if filecoin automatically breaks pieces larger than its maximum piece size. I should have used zeros, but it's mostly limited by write speed I expect. It's hit 38GiB so I'll test it.
0827
0828
I told lotus to import the data while it was being generated, hoping it will import truncated data rather than failing. It's silently spinning the CPU.
Regarding a quick hash document, let's see.
Import data in chunks to a folder. Hash data into a new file. Hash all hashes into a new file.
Maybe that works. Using new files, the system is more clear. How does verification work?
We roughly verify files in reverse order to their creation, and we verify only verifiable files.
So it's possible a system could be checked with something like
check $(ls *.checkable | tac) | grep -v missing
where check runs through /usr/bin/*sum or such
0832
muscle convulsions associated with plans and decisions
0833