Information Content Estimation
Hey all, Am mulling over a federated datastore for zero-knowledge web applications, using hashcash as a "commitment" price for otherwise gratis data storage. All very straightforward, but: Zero knowledge is as much for host protection as client protection. Hosts don't WANT plaintext. Short of stupidly CPU-intensive stuff like letter counting, UTF8 decoding, etc, how might a server verify it's receiving encrypted data? I was thinking a function that estimates apparent entropy and rejects anything that doesn't look random enough to be encrypted, what such functions are available, fast, widely implemented? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
In case anyone else was wondering, I answered my own question, although I have yet to learn whether this is efficient for large files. The answer so far appears to be libmagic, bindings for which in Python are available as "filemagic". Can then be used like so: -> import ssl, magic -> M = magic.Magic() -> M.id_buffer(ssl.RAND_bytes(50)) :: 'data' -> M.id_buffer("This is text, plain and simple") :: 'ASCII text, with no line terminators' -> M.id_buffer("This is text, plain and simple\nand it has more than one -> line") :: 'ASCII text' -> M.id_buffer(rand) :: 'data' -> M.close() (For anyone who's read the docs, yes you're supposed to use a context manager with filemagic, and no it doesn't work on Py3.3 near as I can see) So I'll run with this if nobody has better ideas; user submits data, if it returns "data" on a check with filemagic/libmagic, it's considered encrypted (because encrypted data should be indistinguishable from random data), otherwise it's rejected. On Sat, 19 Oct 2013 20:41:01 +0100 "Cathal Garvey (Phone)" <cathalgarvey@cathalgarvey.me> wrote:
Hey all, Am mulling over a federated datastore for zero-knowledge web applications, using hashcash as a "commitment" price for otherwise gratis data storage. All very straightforward, but: Zero knowledge is as much for host protection as client protection. Hosts don't WANT plaintext.
Short of stupidly CPU-intensive stuff like letter counting, UTF8 decoding, etc, how might a server verify it's receiving encrypted data? I was thinking a function that estimates apparent entropy and rejects anything that doesn't look random enough to be encrypted, what such functions are available, fast, widely implemented?
participants (2)
-
Cathal Garvey
-
Cathal Garvey (Phone)