IQNets: OFFSystem - cursory review and conclusion
The OFFSystem https://en.wikipedia.org/wiki/OFFSystem and similar (that "free speech" XOR concept it was related to and/or based on), is kinda cool at first glance. It uses RAID disk type principles to mix blocks (/files), to successfully mess upthe concept of copyright on a block: Choose a block size, say 128KiB. Take an original "copyrighted" block A of a file, which we want to store in our file/cache/block store. XOR A together with some other random block (let's say, a block B containing completely random data) - this produces a new block C which is a mathematical version/ combination/ "encoding", of block A. We add this new XOR block C into the store, and we can discard the original block A, since by running XOR again, but on blocks B + C, we are able to produce the original block A; of course we must store the XOR block map somewhere too, so we remember which blocks to XOR. The new block C is therefore quite arguably an encoding of the original copyrighted block A. And now, this new block C also looks like completely random data, and statistically it is, just not when XORed with one very special (in relation to block C) yet otherwise completely random, block B. Next we wish to add another block D copyrighted by some other party entirely, and you happen to randomly choose as your "random block for XORing" block C, producing block E (which you add to the block store) and again discarding this time block D. Original copyrighted, now discarded blocks: A D Block store: B (pure random data), C and E ('encodings' of A and D) Who owns the copyrights on the various blocks?: Block A is regenerated using B XOR C. Block D is regenerated using E XOR C, where C was chosen completely randomly. Block C is an encoding of A. Block E is an encoding of C, so E is also, quite evidently, an encoding of A, and in fact E { XOR C XOR B } gives A, so copyright holder of A can quite legitimately, within the assumption that he has copyright over all encodings of A, claim copyright over block E. But this would probably not be considered satisfactory to the copyright holder of block D, who can also with mathematical certainty, demonstrate that block E is an encoding of block D. --------------------------- In the "free speech" version (HTML paper courtesy grarpamp), this principle was used with a distributed "random data" 128KiB block store - sort of P2P, although it wasn't automated and this was 2000, so the link protocol was FTP, with the intention being the ability to distribute messages over the internet, exposing those messages "at some time in the future" via "just happened to notice these blocks ABC XOR to produce message XYZ". --------------------------- Utility appears to be confounding copyright holders. Utility for "free speech"? Not so much, since: - "speech", such as chatting and email demand various "reasonable maximum" latencies - for email, a few minutes max latency is desirable, often much longer can be tolerated, but still only up to "hours" - for chat/text/forum, a typed message needs to appear within seconds, ideally milliseconds What about storage requirements? Although some folks and systems are designed to record all history for a chat room, and email lists tend to work this way by default, there are many use cases for throwaway chat rooms and flippant discussions about the weather which most folks would never bother or want long term storage for. XOR/RAID-type block stores, where new blocks are created and added to the store based on randomly chosen prior blocks, have the apparent (on first look at least) requirement that all older blocks are kept and never deleted - there may be some other blocks in your store, or in the block store of some peer, depending on your old block. Secondly, except that compression is applied before storage, there is no compression of compressible data (this is solvable in end user software of course - just compress all new blocks before storing, and remember to notate the compression algorithm used). Thirdly, deduplication is not part of the above systems, and would not appear to add much utility anyway... most material is new material and some git style content mapping/ addressing solution is what's really wanted in an upper layer anyway. So we have a significant unavoidable problem built in to this type of system: - an ever expanding block store, where older blocks can never be deleted since they may be XOR encodings of newer blocks, in yours or other's' (in the case of a distributed store) block stores --------------------------- Storing interesting content?: Some content is illegal in one jurisdiction, whilst being legal in another jurisdiction. So what about the case where you happen to be in a jurisdiction where such content is illegal in your jurisdiction?: Someone with access to your block store, can run permutation XORs up to large counts, with enough computing power. This can be described also as "pot luck" - do you want to entrust your defence against local laws to pot luck? Of course, blocks can be hidden with encryption, but then we've already got veracrypt, so what does the extra XOR/RAID layer add? Nothing AFAICT. And if you're storing copyrighted content in your block store to which you simply don't have lawful right to access/view/share, then your block XOR maps must be stored somewhere anyway, with an application that can read and present you with your illicit copyright infringing library of content, for local viewing. That app could put a password on that, but now we're back to password management, crypto password salting, key blocks for multiple keys and etc - i.e., we're back to veracrypt/luks, and again, why bother with OFFsystem? --------------------------- Conclusion: XOR/ RAID stripe type block mixing appears potentially great for confounding copyright holders in an actual court case. This could be a good way for a library to store their digital media - "no Judge, if we remove those copyrighted blocks, we are removing the encodings of many other digital works we store; can't do, won't do". "Potentially contentious" media would be consistently followed by a bunch of other media, all intermingled (this is another time factor problem - in OFFsystem, earlier blocks get more dependencies on them, to later blocks).
On Sat, Oct 26, 2019 at 09:42:44AM +1100, Zenaan Harkness wrote:
Original copyrighted, now discarded blocks: A D
Block store: B (pure random data), C and E ('encodings' of A and D)
Who owns the copyrights on the various blocks?:
Block A is regenerated using B XOR C.
Block D is regenerated using E XOR C, where C was chosen completely randomly.
Block C is an encoding of A.
Block E is an encoding of C, so E is also, quite evidently, an encoding of A, and in fact E { XOR C XOR B } gives A, so copyright
The above function is incorrect on second reading, so let's correct it: E XOR C = D, and of course E XOR D = C (remember D was discarded from our block store) so: E XOR D XOR B = A or to use only blocks available in our block store: E XOR (E XOR C) XOR B = A Hopefully this is now correct :) And note, without any other information (namely the block list for XORing to recreate A or D), and because of the nature of block B, we know that block E, by itself, is statistically completely random - indistinguishable from pure random data (likewise, so is block C); it is only someone who knows that C XOR B will produce block A, who can even say "ahah! block C is therefore, obviously and mathematically, an encoding of copyrighted block A". But, this very same mathematics of encoding, as we see above, demonstrates mathematically and with absolute certainty, that block E is likewise an encoding of block A. One can postulate that this above insight from circa 2000, is why statute law at least since then equates "copyright infringement" with downloading and not with storage of the copyrighted material - although I think storage is also covered, at least in Australian jurisdiction. Notwithstanding this legislative shift to "downloading", one can readily use the same above maths to get around this same problem by, for example, using a shared base public domain block set, say 1GiB, 10GiB and 100GiB (depending on the size of files you wish to torrent) and then only ever distributing encodings of the base file set, which, for those in the know, just happen to also match Disney's latest Princess Snowflake syndrome programming material. As long as you have the base file set, which gets reused and is the only block set that gets reused (for torrents up to that size), then it is only the base file/block set that you cannot delete from your block store, making individual libraries (as opposed to large shared community libraries) viable (end users with private libraries, here and there will need to purge older content). "So lads, I happen to have this one little secret high quality encoding of the Wikipedia base media collection y'all might be interested in - I needed a totally random new name for it since there are already so many encodings of it getting around (I know, I know, we're utter sluts for Wikipedia yo!), so for a name, I just happened to chose "Wikimedia 100GiB encoding, Game of Thrones Supre Mega HQ DVD Bluray WebRip, so sad 2019 miscegenation edition" Get it while it's hot! - Oh, and by the way, I had an encoding problem, so only the first 27.12 GiBs has been encoded, sorry about that =)" Question is, is there a bush lawyer or actual lawyer who can say whether this would actually be useful if defending a MAFIAA court case?
holder of A can quite legitimately, within the assumption that he has copyright over all encodings of A, claim copyright over block E.
But this would probably not be considered satisfactory to the copyright holder of block D, who can also with mathematical certainty, demonstrate that block E is an encoding of block D.
---------------------------
In the "free speech" version (HTML paper courtesy grarpamp), this principle was used with a distributed "random data" 128KiB block store - sort of P2P, although it wasn't automated and this was 2000, so the link protocol was FTP, with the intention being the ability to distribute messages over the internet, exposing those messages "at some time in the future" via "just happened to notice these blocks ABC XOR to produce message XYZ".
---------------------------
Utility appears to be confounding copyright holders.
Utility for "free speech"? Not so much, since:
- "speech", such as chatting and email demand various "reasonable maximum" latencies - for email, a few minutes max latency is desirable, often much longer can be tolerated, but still only up to "hours" - for chat/text/forum, a typed message needs to appear within seconds, ideally milliseconds
What about storage requirements? Although some folks and systems are designed to record all history for a chat room, and email lists tend to work this way by default, there are many use cases for throwaway chat rooms and flippant discussions about the weather which most folks would never bother or want long term storage for.
XOR/RAID-type block stores, where new blocks are created and added to the store based on randomly chosen prior blocks, have the apparent (on first look at least) requirement that all older blocks are kept and never deleted - there may be some other blocks in your store, or in the block store of some peer, depending on your old block.
Secondly, except that compression is applied before storage, there is no compression of compressible data (this is solvable in end user software of course - just compress all new blocks before storing, and remember to notate the compression algorithm used).
Thirdly, deduplication is not part of the above systems, and would not appear to add much utility anyway... most material is new material and some git style content mapping/ addressing solution is what's really wanted in an upper layer anyway.
So we have a significant unavoidable problem built in to this type of system:
- an ever expanding block store, where older blocks can never be deleted since they may be XOR encodings of newer blocks, in yours or other's' (in the case of a distributed store) block stores
---------------------------
Storing interesting content?:
Some content is illegal in one jurisdiction, whilst being legal in another jurisdiction. So what about the case where you happen to be in a jurisdiction where such content is illegal in your jurisdiction?:
Someone with access to your block store, can run permutation XORs up to large counts, with enough computing power.
This can be described also as "pot luck" - do you want to entrust your defence against local laws to pot luck?
Of course, blocks can be hidden with encryption, but then we've already got veracrypt, so what does the extra XOR/RAID layer add? Nothing AFAICT.
And if you're storing copyrighted content in your block store to which you simply don't have lawful right to access/view/share, then your block XOR maps must be stored somewhere anyway, with an application that can read and present you with your illicit copyright infringing library of content, for local viewing. That app could put a password on that, but now we're back to password management, crypto password salting, key blocks for multiple keys and etc - i.e., we're back to veracrypt/luks, and again, why bother with OFFsystem?
---------------------------
Conclusion:
XOR/ RAID stripe type block mixing appears potentially great for confounding copyright holders in an actual court case.
This could be a good way for a library to store their digital media - "no Judge, if we remove those copyrighted blocks, we are removing the encodings of many other digital works we store; can't do, won't do".
"Potentially contentious" media would be consistently followed by a bunch of other media, all intermingled (this is another time factor problem - in OFFsystem, earlier blocks get more dependencies on them, to later blocks).
participants (1)
-
Zenaan Harkness