I forgot to take notes during, so this is ex post facto. Other participants, please feel free to reply and add your recollections! In attendance: Zooko, CodesInChaos, Andrew, Brian, Ali who popped in briefly because the hangout was posted as a public chat on G+, David-Sarah The topic was mostly dynamic Merkle Trees and LDMFs again. (For those following along at home, Dynamic Merkle Trees are this idea of using something like a Red-Black Tree instead of a static Binary Tree as the structure for your secure-hash-function based "authenticated data structure". Andrew Miller claims that this is a unifying abstraction that describes both git and Bitcoin, as well as any future Tahoe-LAFS Large Distributed Mutable File. See previous week's notes 9.) I said that I had been looking at using Tahoe-LAFS as the backend for other systems, to give them decentralization, fault-tolerance, integrity-checking, access control, and encryption. Those systems that I've looked at a little bit include Ward Cunningham's Smallest Federated Wiki, Dan Whaley's Hypothesis, and Jeff Garzik's gleam-in-the-eye of a fault-tolerant form to make the Bitcoin forums DoS-resistant. In each of these cases as well as in others, I have the feeling that programmers want some sort of search or query language, and when they find that LAFS doesn't natively offer that, they give up on using it. Brian replied that what you have to do is maintain your own index. He called it "voluntary search", which is a turn of phrase that I like. MK_FG's recent attempt to make a Skydrive plugin for the Cloud backend is another example where voluntary search is needed. Andrew asked for a specific use case for LDMF. How and why anyone would use LDMFs even if we had them? After all, there's no way to use the features of an LDMF, such as efficiently inserting bytes into the middle of a file, through the standard POSIX file API. I groped around for a while trying to answer that. There are a few half-formed ideas about things that LDMFs *might* turn out to be useful for, such as "filesystem in a file", which is basically what the authors of virtual machines are doing to store their virtual machine images in POSIX filesystems. Other half-formed ideas are to use LDMFs as the backend storage for git. But I finally got a hold of a use case that seems clear enough: scalable directories. Directories ought to be able to hold arbitrarily many entries, support efficiently adding or removing children, ought to be able to maintain a sort order on the children, and support range queries on the children. That *is* something that you can express through the POSIX filesystem API, it seems clear that it could be useful, and very importantly for the purposes of Andrew's research, it is easy to evaluate how well your solution satisfies it. Such directories might make a good building block for search. I remember thinking when I unsuccessfully pitched LAFS as a possible backend for Singly/LockerProject that if only directories had been scalable, sorted, and range-queriable then LAFS might have sufficed for their needs. Along the way I pointed people at the performance.rst file: https://tahoe-lafs.org/trac/tahoe-lafs/browser/git/docs/performance.rst Only to see that it appears to be describing the performance of old SDMFs, not of MDMFs! For example it says: """ Downloading B bytes of an A-byte mutable file cpu: ~A network: A memory footprint: A notes: As currently implemented, mutable files must be downloaded in their entirety before any part of them can be read. We are exploring fixes for this; see ticket #393 for more information. """ Which is very stale information. We've closed #393 and made it so that you only need to download about B bytes to read B bytes of an A-byte mutable file. I thought we had updated performance.rst to reflect that. What gives? Do I misremember or did we somehow regress performance.rst? There are a few tickets about updating performance.rst to reflect the existence of MDMF: #1497, #1772. Andrew mentioned that since last week he implemented a new Red-Black Merkle Tree in C++, and it is 400X faster than the first Python-based prototype. I questioned why the language would make that much difference and suggested he try PyPy. Regards, Zooko https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1497# update docs/performance.rst to explain the performance of MDMFs https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1772# update docs to include MDMF 9 https://tahoe-lafs.org/pipermail/tahoe-dev/2012-October/007750.html _______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE