Thanks for your comments on this folks ... lots of food for thought.
coderman@gmail.com said:
the longer discussion is how to make decentralized
search useful. "Google style" search has a terrific performance
advantage over decentralized designs by brute force. however,
take advantage of massive endpoint / peer processing and
resources combined with implicit observational metrics for
reputation and recommendation, inside a well integrated
framework for resource discovery in usable software, and you
have something more robust and more effective than "Google
style" could ever provide.
Yes, it does seem like the speed advantage of centralized search
will be a barrier to adoption of decentralized search. This is
analogous to the difficulty getting people adopt systems like Tor
because it is slow. But I think that as more people become aware of
the extent of state/corporate surveillance, they will become more
inclined to accept solutions that are slower in exchange for not
having their search habits monitored, and also being able to receive
uncensored search results. As long as decentralized search is (a)
usable/simple and (b) provides quality results, I feel like speed is
somewhat of a secondary concern. The key question to me is: "How do
we build a search engine that is simple
enough for Grandma to use, that produces quality results
without massive centralized indexing servers?"
Standalone P2P search applications (e.g. Yacy) don't really make
sense from a usability perspective. It's unrealistic to expect
hundreds of millions of users to download a standalone Java app, and
configure a P2P search node. What would make more sense, and would
lead to much more rapid/widespread adoption, is to use protocols
like WebSockets
/ WebRTC
to facilitate P2P
connectivity in the web browser, so that everything can be
done via a simple browser plugin that can be installed by anyone
with few clicks, and would then just allow people to use the browser
search bar as usual. This browser integration would also have the
bonus of simplifying the choice of what to index -- it could just
default to indexing bookmarked and frequently-visited pages, and
then be optionally customized by more advanced users to create
custom indexes (i.e. all of the complexity of setting up indexing
could be hidden from the user, unless they choose to look for it).
To help bootstrap the WebRTC nodes into the P2P network, and to deal
with some of the instability inherent in P2P networks (i.e. by
creating stable "super-peer"
indexing nodes), I like cathalgarvey's suggestion of utilizing
something like a Wordpress plugin that would use the same
index/search standard as the WebRTC clients, but could additionally
bootstrap the web-based clients. As cathalgarvey said:
A standard rather than a codebase. But there's a huge
advantage to this line of thought, if you'll bear with me. A
two-digit fraction of the web right now is powered by
Wordpress.org, who explicitly advocate open/free culture. If you
can convince them to include a social search/index standard of
this type, which is virtually free in terms of computer
resources, then you'd have it deployed across the web in days as
the next update rolled out. Indeed, even if Wordpress seemed
reluctant, a wordpress plugin could probably be written quickly
enough to enable such a thing and make it available for casual
use. Suddenly, a bunch of PHP-powered sites around the web start
committing small bits and pieces of resources to a social search
engine based on human-curated attestations of trust that flow
through a web, helping to confine spammers to the fringes and to
users with stupid taste.
What would also be interesting is if this standard enabled some kind
of "pingback" mechanism whereby WebRTC nodes could be associated
with specific super-peer nodes (e.g. maybe people who have
bookmarked the super-peer site in their browser, or subscribe to its
feed), so that in addition to broad/random queries that target the
entire P2P network, clients could also create more targeted custom
searches that say something like "start the search with the nodes
that are clustered around these super-peers". This would create an
enormous diversity of search possibilities -- hundreds of thousands
(millions) of different "search engines", each of which would return
different results for the same query, depending on where you start
your search ... This diversity is another reason I find P2P search
interesting, in addition to the benefits re: censorship, traffic
shaping, and surveillance.
I've been looking around for some kind of WebRTC P2P search engine
and haven't found anything yet ... maybe I've found a programming
project for this summer :)
-- Jesse Taylor