Replacing corporate search engines with anonymous/decentralized search

older
Fwd: Email is unsecurable - maybe...

Jesse R. Taylor

29 Dec 2013 29 Dec '13

8:17 a.m.

Recently there has been a lot of focus on the importance of developing more secure alternatives to email, instant messaging, browsing, etc. ... but I've seen very little focus on the need for development of alternatives to corporate search engines. Corporate/state control of the Internet involves a three pronged strategy of: mass surveillance, censorship/criminalization of undesirable ideas, and traffic shaping (i.e. directing people away of things you don't want them to see, and towards things you do). Corporate search engines are implicated in all three of these, i.e. they: 1) Monitor what we are searching for 2) Censor websites by removing them from search engine indexes 3) Shape traffic via non-transparent algorithms that can sort search results in a way that grants prominence to certain types of sites (corporate media, etc.), in order to suit the interests of multinational corporations and governments. ... so obviously, developing alternatives to corporate search is every bit as crucial for protecting privacy and free speech as encrypting our emails/chats, and anonymizing our browsing ... But I've seen very little information about practical/simple options that are available for anonymous and decentralized Internet search software. I've only been able to find a few examples like YaCy, but they all seem overly complex and unusable by the vast majority of users. What are the major barriers to creating simple tools (e.g. a plugin for Firefox) that would enable users to perform anonymous, p2p web search (even if it's much slower than centralized search) and break away from using corporate search? Which current efforts to create decentralized search seem most promising to you from a privacy/security standpoint? -- Jesse Taylor

Attachments:

attachment.html (text/html — 2.1 KB)

Show replies by date

Eric Mill

29 Dec 29 Dec

8:39 a.m.

Right now, I'd even settle for a competitive, interesting marketplace of corporate search engines. On Sun, Dec 29, 2013 at 3:17 AM, Jesse R. Taylor <jessetaylor84@riseup.net>wrote:

...

Recently there has been a lot of focus on the importance of developing more secure alternatives to email, instant messaging, browsing, etc. ... but I've seen very little focus on the need for development of alternatives to corporate search engines.

Corporate/state control of the Internet involves a three pronged strategy of: mass surveillance, censorship/criminalization of undesirable ideas, and traffic shaping (i.e. directing people away of things you don't want them to see, and towards things you do). Corporate search engines are implicated in all three of these, i.e. they:

1) Monitor what we are searching for 2) Censor websites by removing them from search engine indexes 3) Shape traffic via non-transparent algorithms that can sort search results in a way that grants prominence to certain types of sites (corporate media, etc.), in order to suit the interests of multinational corporations and governments.

... so obviously, developing alternatives to corporate search is every bit as crucial for protecting privacy and free speech as encrypting our emails/chats, and anonymizing our browsing ...

But I've seen very little information about practical/simple options that are available for anonymous and decentralized Internet search software. I've only been able to find a few examples like YaCy, but they all seem overly complex and unusable by the vast majority of users. What are the major barriers to creating simple tools (e.g. a plugin for Firefox) that would enable users to perform anonymous, p2p web search (even if it's much slower than centralized search) and break away from using corporate search? Which current efforts to create decentralized search seem most promising to you from a privacy/security standpoint?

-- Jesse Taylor <http://www.interference.cc>

-- konklone.com | @konklone <https://twitter.com/konklone>

coderman

31 Dec 31 Dec

7:21 a.m.

On Sun, Dec 29, 2013 at 12:17 AM, Jesse R. Taylor <jessetaylor84@riseup.net> wrote:

...

Recently there has been a lot of focus on the importance of developing more secure alternatives to email, instant messaging, browsing, etc. ... but I've seen very little focus on the need for development of alternatives to corporate search engines.

decentralized search (not just not-corporate search) persists as one of the great practical challenges in peer to peer networking. i have more to say later, but one effort from back in early 2000 is alpine: https://peertech.org/alpine inside the 2004 snapshot there is also docs and implementation of feedbackfs which is used to gather implicit feedback on recommendation / discovery of file based resources. alpine is explicitly highly connected, flatter than not network topology to improve robustness in the face of failure and active attacks, and to avoid limitations inherent in many connection oriented operating system facilities/sockets. i am not quite an impartial party ;) but other approaches which are not a feasible replacement include: - the old skewl (mostly)flooding broadcasts like gnutella - fragile, hard to defend constructs like DHTs as keyword indexes - aggressive caching with local search (110% useful, but not sufficient alone) - distributed (but better somehow) search engines on darknets, etc. these are more about search privacy or deep search more than decentralized search.

...

But I've seen very little information about practical/simple options that are available for anonymous and decentralized Internet search software. ... What are the major barriers to creating simple tools [...] [... for] anonymous, p2p web search (even if it's much slower than centralized search) and break away from using corporate search? Which current efforts to create decentralized search seem most promising to you from a privacy/security standpoint?

the longer discussion is how to make decentralized search useful. "Google style" search has a terrific performance advantage over decentralized designs by brute force. however, take advantage of massive endpoint / peer processing and resources combined with implicit observational metrics for reputation and recommendation, inside a well integrated framework for resource discovery in usable software, and you have something more robust and more effective than "Google style" could ever provide. this is quite the trick, however! despite an inter-operable component model interface, and dynamic runtime module support to extend discovery and wire protocol extensions, and other intentional efforts at encouraging adoption and integration, alpine failed to bootstrap. (i did many things wrong, but those things i did at least make conscious effort to do right. did i mention this is a hard problem? :) this project has been excavated from archives, and will receive maintenance upgrades[0] at minimum and significant improvement a possible option, depending. best regards, [0] maintenance work for testable alpine builds - fix/improve g++ usage. - add IPv6 support. (specifically ORCHID addrs for darknet search) - update feedbackfs to latest fusefs bindings - update inotify bindings in feedbackfs - multiple-socket support, multi-addr discovery

The Doctor

3 Jan 3 Jan

10:49 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/30/2013 11:21 PM, coderman wrote:

...

i have more to say later, but one effort from back in early 2000 is alpine:

What about YaCy? http://yacy.de/en/index.html

...

but other approaches which are not a feasible replacement include: - the old skewl (mostly)flooding broadcasts like gnutella - fragile, hard to defend constructs like DHTs as keyword indexes - aggressive caching with local search (110% useful, but not sufficient alone) - distributed (but better somehow) search engines on darknets, etc.

What aspects would constitute feasible replacements? - -- The Doctor [412/724/301/703] [ZS] PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F DD89 3BD8 FF2B 807B 17C1 WWW: https://drwho.virtadpt.net/ "You knew the job was dangerous when you took it!" --Super Chicken -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLHPmsACgkQO9j/K4B7F8HxgQCfamx4+RWlapLfH6iIhfvKYaAv E7IAn3Hv0zJ7PEK7yltRP0wgJboq2/YI =1ghA -----END PGP SIGNATURE-----

stef

31 Dec 31 Dec

12:42 p.m.

On Sun, Dec 29, 2013 at 12:17:07AM -0800, Jesse R. Taylor wrote:

...

but I've seen very little focus on the need for development of alternatives to corporate search engines.

[disregarding the corporate focus] i can warmly recommend https://searx.0x2a.tk -- pgp: https://www.ctrlc.hu/~stef/stef.gpg pgp fp: FD52 DABD 5224 7F9C 63C6 3C12 FC97 D29F CA05 57EF otr fp: https://www.ctrlc.hu/~stef/otr.txt

Lodewijk andré de la porte

2:37 p.m.

I'd like to ask people to wonder what Search Engines really do for us. Where is the catalog? Where is the cultivated list of good resources? Do search engines provide the same level of guidance to its users that a written overview can? Why don't we create a distributed website catalog? It's harder, as anti-spam is the core feature. But competing with Google seems rather foolhardy at the moment. Maybe the word catalog isn't right, catalogs are too static and not discovery targeted at all. Maybe a Yahoo! answers type of tagging/cataloging would work rather well. Anyway: think about it guys! I'm sure there's a better way than "this keyword is also in this page which links to other good pages"!

coderman

1 Jan 1 Jan

3 a.m.

On Tue, Dec 31, 2013 at 6:37 AM, Lodewijk andré de la porte <l@odewijk.nl> wrote:

...

I'd like to ask people to wonder what Search Engines really do for us. Where is the catalog? Where is the cultivated list of good resources?

Do search engines provide the same level of guidance to its users that a written overview can?

what you want more than traditional search is resource discovery, which includes recommendation and per-peer-perspective reputation. this is an area where centralized search is incapable or untrustworthy enough compared to fully decentralized options. done centrally, that central trusted party would be privy to all your inter-peer interactions. in decentralized fashion this exposes only limited information to each peer. (central services usually paying the cost of the infrastructure to analyze all to all interactions by selling your private information to third parties, or delegating to those who do...)

...

Why don't we create a distributed website catalog? It's harder, as anti-spam is the core feature. But competing with Google seems rather foolhardy at the moment.

public web is a small slice of all that is of interest. just put a internet archive.org copy on a hidden Tahoe-LAFS and everyone gets a copy of the public web for local querying. (better yet, make a PIR LAFS ;) ... this would need a little coding *grin*

realcr

4:22 p.m.

Some ideas I had regarding searching: - The web used to be some kind of its own index, back in the old days, where you could get from one webpage to another using links. If you get to think about it, links are not much different than a DHT: Every website has links to some sites that are similar to that site, and maybe to some sites that are a bit different. In order to find something specific you work your way through the links just like you would do in a DHT. The introduction of great search engines eliminated the need to put these kind of links in websites. So that's one solutions, links. Kind of primitive, but I think it used to work. Maybe I could put it in a less primitive way: Assume that you search for X. An example for a new search method could work like this: You are given 5 different things, and each of those you pick the one closest to X. Then again you are given 5 things, and you pick again the one closest to X. And so on. Maybe after a few iterations you get what you want. Just a strange idea, though maybe it could made practical somehow. I like it because it contains no analysis of words or phrases. - Crowdsourcing the creation of the index. I think it was mentioned in some of the messages on this thread. I believe that even the best algorithms and analysis methods are not good enough to index websites the right way. On top of that, All that SEO (Search engine optimization) that is so popular these days makes a lot of websites full of fluff show up in the top search results, which I think is really a shame. I suggest to those of you who didn't, to check out freenet, just to see what it's like to have some real content out there. It doesn't have much content, though the little it has is worth seeing. Regarding crowdsourcing the creation of the index, I suggest doing it using some kind of incentives for people to do it, and in the same time some trust mechanisms to make sure that the index crowdsourcing is not abused. I don't write here about the technological tools to do this, though I do believe we are getting close to make this kind of thing possible. Though, I still don't fully understand how to classify the websites: Maybe keywords, or some kind of similarity network/metric between websites. Anyways, These were my 50 cents. On Wed, Jan 1, 2014 at 5:00 AM, coderman <coderman@gmail.com> wrote:

...

On Tue, Dec 31, 2013 at 6:37 AM, Lodewijk andré de la porte <l@odewijk.nl> wrote:

...
I'd like to ask people to wonder what Search Engines really do for us. Where is the catalog? Where is the cultivated list of good resources?

Do search engines provide the same level of guidance to its users that a written overview can?

what you want more than traditional search is resource discovery, which includes recommendation and per-peer-perspective reputation. this is an area where centralized search is incapable or untrustworthy enough compared to fully decentralized options.

done centrally, that central trusted party would be privy to all your inter-peer interactions. in decentralized fashion this exposes only limited information to each peer. (central services usually paying the cost of the infrastructure to analyze all to all interactions by selling your private information to third parties, or delegating to those who do...)

...
Why don't we create a distributed website catalog? It's harder, as anti-spam is the core feature. But competing with Google seems rather foolhardy at the moment.

public web is a small slice of all that is of interest. just put a internet archive.org copy on a hidden Tahoe-LAFS and everyone gets a copy of the public web for local querying. (better yet, make a PIR LAFS ;) ... this would need a little coding *grin*

freek2023＠yahoo.de

5:57 p.m.

Hi there, did anyone come across yacy[0]? It's a crawling software, that creates an shared index within a defined realm. I can say that it's OSS (GPL), but I had no time to inspect the code. I'm wondering, what's this list's opinion on it. Cheers, frk [0] http://yacy.net

James A. Donald

11:46 p.m.

On 2014-01-02 02:22, realcr wrote:

...

Some ideas I had regarding searching:

- The web used to be some kind of its own index, back in the old days, where you could get from one webpage to another using links. If you get to think about it, links are not much different than a DHT: Every website has links to some sites that are similar to that site, and maybe to some sites that are a bit different. In order to find something specific you work your way through the links just like you would do in a DHT. The introduction of great search engines eliminated the need to put these kind of links in websites. So that's one solutions, links. Kind of primitive, but I think it used to work.

Search humans, instead of search engines. Human authority instead of AI. As a matter of fact, it still does work.

Sean Lynch

2 Jan 2 Jan

9:04 p.m.

On Wed, Jan 1, 2014 at 3:46 PM, James A. Donald <jamesd@echeque.com> wrote:

...

As a matter of fact, it still does work.

It works far less, though, since most people expect others to rely on search engines, so they don't bother to link anymore. Here's a thought: browser extension that stores your "personal" web index, and gives you a typeahead menu when you write about concepts in your index, prompting you to convert phrases to links. Like the way Facebook always wants to convert the names of people and pages to tags. Even if it were just primed with Wikipedia, that would drastically reduce the amount of Google searching people need to do when reading stuff you write.

rysiek

5 Jan 5 Jan

6:39 p.m.

Dnia czwartek, 2 stycznia 2014 13:04:17 Sean Lynch pisze:

...

On Wed, Jan 1, 2014 at 3:46 PM, James A. Donald <jamesd@echeque.com> wrote:

...
As a matter of fact, it still does work.

It works far less, though, since most people expect others to rely on search engines, so they don't bother to link anymore.

Here's a thought: browser extension that stores your "personal" web index, and gives you a typeahead menu when you write about concepts in your index, prompting you to convert phrases to links. Like the way Facebook always wants to convert the names of people and pages to tags. Even if it were just primed with Wikipedia, that would drastically reduce the amount of Google searching people need to do when reading stuff you write.

In Firefox it's called "The Awesome Bar", and it sifts through your history and bookmarks (I bookmark a lot, and tag these pretty exactly, which helps immensely). The downside, of course, is that it works only for links that I have already visited. So here's the idea: sharing bookmark tags and links with each other, via some extention for example, and making "The Awesome Bar" (damn, I hate that name) sift through bookmarks/tags of people in your "network" (what that means would have to be defined, but as Mozilla Sync can already store bookmarks, the data can already be on a server, just use it). -- Pozdr rysiek

stef

6:52 p.m.

On Sun, Jan 05, 2014 at 07:39:40PM +0100, rysiek wrote:

...

So here's the idea: sharing bookmark tags and links with each other, via some extention for example, and making "The Awesome Bar" (damn, I hate that name) sift through bookmarks/tags of people in your "network" (what that means would have to be defined, but as Mozilla Sync can already store bookmarks, the data can already be on a server, just use it).

omnom[1] should be able to serve as the server-side, you still need to develop some kind of client-side extension though. [1] omnom https://gitorious.org/tagr/omnom/source/419b512734021b71c01500514b5ae87d0b7f... -- pgp: https://www.ctrlc.hu/~stef/stef.gpg pgp fp: FD52 DABD 5224 7F9C 63C6 3C12 FC97 D29F CA05 57EF otr fp: https://www.ctrlc.hu/~stef/otr.txt

rysiek

7:22 p.m.

Dnia niedziela, 5 stycznia 2014 19:52:12 stef pisze:

...

On Sun, Jan 05, 2014 at 07:39:40PM +0100, rysiek wrote:

...
So here's the idea: sharing bookmark tags and links with each other, via some extention for example, and making "The Awesome Bar" (damn, I hate that name) sift through bookmarks/tags of people in your "network" (what that means would have to be defined, but as Mozilla Sync can already store bookmarks, the data can already be on a server, just use it).

omnom[1] should be able to serve as the server-side, you still need to develop some kind of client-side extension though.

[1] omnom https://gitorious.org/tagr/omnom/source/419b512734021b71c01500514b5ae87d0b7f 3ab7:features.txt

Humm, I think I have already came across omnom some time ago. Well, I'll have to look into it. A short question: what would you say are the most important advantages of omnom over Mozilla Sync? -- Pozdr rysiek

stef

10:12 p.m.

On Sun, Jan 05, 2014 at 08:22:51PM +0100, rysiek wrote:

...

Humm, I think I have already came across omnom some time ago. Well, I'll have to look into it. A short question: what would you say are the most important advantages of omnom over Mozilla Sync?

i don't know mozilla sync. one huge advantage of omnom is, that it snapshots the pages you bookmark as they are rendered in your firefox. i guess, that could also be useful for searching bookmarked pages... also there are hooks in the code that connect multiple omnom instances and other services for the federation of tags, but that is quite dead code. -- pgp: https://www.ctrlc.hu/~stef/stef.gpg pgp fp: FD52 DABD 5224 7F9C 63C6 3C12 FC97 D29F CA05 57EF otr fp: https://www.ctrlc.hu/~stef/otr.txt

rysiek

6 Jan 6 Jan

10:25 a.m.

Dnia niedziela, 5 stycznia 2014 23:12:28 stef pisze:

...

On Sun, Jan 05, 2014 at 08:22:51PM +0100, rysiek wrote:

...
Humm, I think I have already came across omnom some time ago. Well, I'll have to look into it. A short question: what would you say are the most important advantages of omnom over Mozilla Sync?

i don't know mozilla sync. one huge advantage of omnom is, that it snapshots the pages you bookmark as they are rendered in your firefox. i guess, that could also be useful for searching bookmarked pages... also there are hooks in the code that connect multiple omnom instances and other services for the federation of tags, but that is quite dead code.

I'm bought! Mozilla Sync has neither federation nor snapshotting. Me gusta mucho. -- Pozdr rysiek

Sean Lynch

17 Jan 17 Jan

6:55 p.m.

On Sun, Jan 5, 2014 at 10:39 AM, rysiek <rysiek@hackerspace.pl> wrote:

...

Dnia czwartek, 2 stycznia 2014 13:04:17 Sean Lynch pisze:

...
On Wed, Jan 1, 2014 at 3:46 PM, James A. Donald <jamesd@echeque.com> wrote:

...
As a matter of fact, it still does work.

It works far less, though, since most people expect others to rely on search engines, so they don't bother to link anymore.

Here's a thought: browser extension that stores your "personal" web index, and gives you a typeahead menu when you write about concepts in your index, prompting you to convert phrases to links. Like the way Facebook always wants to convert the names of people and pages to tags. Even if it were just primed with Wikipedia, that would drastically reduce the amount of Google searching people need to do when reading stuff you write.

In Firefox it's called "The Awesome Bar", and it sifts through your history and bookmarks (I bookmark a lot, and tag these pretty exactly, which helps immensely).

I'm talking about anytime you type into text boxes. The goal of this proposal was to return to the hypertextual nature of the web in order to reduce our dependence on centralized indexes. However, I find your proposal to improve the utility of the AwesomeBar interesting.

...

The downside, of course, is that it works only for links that I have already visited.

So here's the idea: sharing bookmark tags and links with each other, via some extention for example, and making "The Awesome Bar" (damn, I hate that name) sift through bookmarks/tags of people in your "network" (what that means would have to be defined, but as Mozilla Sync can already store bookmarks, the data can already be on a server, just use it).

An even simpler proposal: assuming the AwesomeBar doesn't already include live bookmarks in its autocomplete functionality, add it. Then anyone can simply publish their bookmarks via RSS and anyone else can import them. Then someone can just add functionality to create live bookmarks that pull signed and possibly encrypted (with Ed25519/Curve25519 of course) RSS feeds from a DHT.

rysiek

18 Jan 18 Jan

10 p.m.

Dnia piątek, 17 stycznia 2014 10:55:33 Sean Lynch pisze:

...

...
In Firefox it's called "The Awesome Bar", and it sifts through your history and bookmarks (I bookmark a lot, and tag these pretty exactly, which helps immensely).

I'm talking about anytime you type into text boxes.

Which "text boxes"? Any form on Teh Intertubes? THe AwesomeBar or SearchBar?

...

The goal of this proposal was to return to the hypertextual nature of the web in order to reduce our dependence on centralized indexes. However, I find your proposal to improve the utility of the AwesomeBar interesting.

It's easy (it just requires a habit of decent tagging), and effective -- when I remember an information I found important from a website I visited, it's usually in my bookmarks, tagged properly. This means 95% of the time as far as information I have already seen is concerned, the AwesomeBar reaching down to my bookmarks is enough to get what I need, no need to go to Google here.

...

...
The downside, of course, is that it works only for links that I have already visited.

So here's the idea: sharing bookmark tags and links with each other, via some extention for example, and making "The Awesome Bar" (damn, I hate that name) sift through bookmarks/tags of people in your "network" (what that means would have to be defined, but as Mozilla Sync can already store bookmarks, the data can already be on a server, just use it).

An even simpler proposal: assuming the AwesomeBar doesn't already include live bookmarks in its autocomplete functionality, add it. Then anyone can simply publish their bookmarks via RSS and anyone else can import them. Then someone can just add functionality to create live bookmarks that pull signed and possibly encrypted (with Ed25519/Curve25519 of course) RSS feeds from a DHT.

Not that easy, as everybody would need to publish their bookmark RSS/Atom channels not entirely in accordance with how it is being done usually on the 'Net. Usually, only the first 10-30 headlines/items are in the RSS/Atom channel, the older ones (it is assumed) are already cached in users' RSS/Atom readers. Firefox does not cache live bookmarks, so each time you only get the current 10-30 items, all older are "lost". This makes sense with regard to the intended use of this functionality in Firefox (and other browsers), but unfortunately makes it harder to implement interesting bookmarks sharing the way you described. -- Pozdr rysiek

Cathal Garvey

6 Jan 6 Jan

12:48 p.m.

I've been considering this. We all still link, and link richly and often. It's just that our links no longer take the form of a dedicated "link index" as they did in the nineties, and are now more often either inline or colon-ised; either "I just found a [great resource on whatever]" or "I just found a great resource on whatever: [link]". The former is more common on blogs, the latter on twitter. So, to take your idea. An index based on your web browsing habits is, I feel, not so useful, because we all often follow links to stuff that's not generally interesting to us, often by anonymous links (2 girls one kitteh). So the index would be populated with lots of spurious stuff that'd be forward-indexed and made into nonsense. You type "girls" and kitteh-scat-porn comes up. :) However, there's another type of index that at least some of us engage in that's more likely to be relevant; RSS/Atom feeds, and Microstatus feeds. The former is interest-based, the latter social-based. By scanning the stuff we explicitly subscribe to and indexing ahead, we not only get a curated source by which to infer "trusted" metadata (i.e. if we follow a medical blog and it links to viagra, we know it's not a spammer but a trusted person recommending a link), but the links themselves are likely to be related to our interests. By then publishing our indexes along with our blogs or microstatus feeds (borrow an XML callback system from the fairly Byzantine status.net standard and publish it in your post headers/about me block), we also get "trusted" access to a web of our friends, follows and favourite blogs by which to form a social search engine. This has several advantages. For one thing, the social/subscription web can be used to infer relative trust. If you follow a person who recommends a link and several people you know also follow and trust that person's links, then that link may be given more relative "trust" than an outlier that only you follow. So, that seems pretty pie in the sky, right? A standard rather than a codebase. But there's a huge advantage to this line of thought, if you'll bear with me. A two-digit fraction of the web right now is powered by Wordpress.org, who explicitly advocate open/free culture. If you can convince them to include a social search/index standard of this type, which is virtually free in terms of computer resources, then you'd have it deployed across the web in days as the next update rolled out. Indeed, even if Wordpress seemed reluctant, a wordpress plugin could probably be written quickly enough to enable such a thing and make it available for casual use. Suddenly, a bunch of PHP-powered sites around the web start committing small bits and pieces of resources to a social search engine based on human-curated attestations of trust that flow through a web, helping to confine spammers to the fringes and to users with stupid taste. Also worthy of consideration is Jekyll, though that's a static site so index compilation would be more costly per-publication (you'd have to recompile with each blogpost) and there's no scope for an active callback where readers can suggest index additions back. Thoughts welcome, I don't even code PHP so it's all speculation here. :) On 02/01/14 21:04, Sean Lynch wrote:

...

On Wed, Jan 1, 2014 at 3:46 PM, James A. Donald <jamesd@echeque.com> wrote:

...
As a matter of fact, it still does work.

It works far less, though, since most people expect others to rely on search engines, so they don't bother to link anymore.

Here's a thought: browser extension that stores your "personal" web index, and gives you a typeahead menu when you write about concepts in your index, prompting you to convert phrases to links. Like the way Facebook always wants to convert the names of people and pages to tags. Even if it were just primed with Wikipedia, that would drastically reduce the amount of Google searching people need to do when reading stuff you write.

Sean Lynch

1 Jan 1 Jan

6:11 p.m.

On Tue, Dec 31, 2013 at 6:37 AM, Lodewijk andré de la porte <l@odewijk.nl>wrote:

...

I'd like to ask people to wonder what Search Engines really do for us. Where is the catalog? Where is the cultivated list of good resources?

Well, in Google's case, the list is curated by those doing the linking, but Google is trading richness of metadata for coverage.

...

Do search engines provide the same level of guidance to its users that a written overview can?

No, but they cover far more of the Web than a manually curated index ever could. They can answer questions like "what was that article I read last week on this topic?" and "what other pictures exist of this person?" Nobody's going to be writing written summaries of every single news article and blog post.

...

Why don't we create a distributed website catalog? It's harder, as anti-spam is the core feature. But competing with Google seems rather foolhardy at the moment.

I think this is a good idea. Spam can be handled by just signing all the pages and having signed white and blacklists to create a web of trust/distrust. Proof-of-work could be used when creating new signing identities in order to make the blacklists useful.

...

Maybe the word catalog isn't right, catalogs are too static and not discovery targeted at all.

I imagine something as simple as StumbleUpon, just "I like/dislike this", perhaps with tags. One could add a signed inverted index as well to facilitate searching by phrase.

...

Maybe a Yahoo! answers type of tagging/cataloging would work rather well.

Anyway: think about it guys! I'm sure there's a better way than "this keyword is also in this page which links to other good pages"!

Been thinking about it for a while ;-)

The Doctor

3 Jan 3 Jan

11:31 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/01/2014 10:11 AM, Sean Lynch wrote:

...

I imagine something as simple as StumbleUpon, just "I like/dislike this", perhaps with tags. One could add a signed inverted index as well to facilitate searching by phrase.

It sounds like you're describing one of the open source clones of Delicious, with a distributed database on the back end rather than a relational database. Something like Scuttle (http://sourceforge.net/projects/scuttle/), Selficious (https://github.com/initpy/selficious) (but that uses AppEngine), or Scrumptious (https://github.com/jpmens/scrumptious) (which uses CouchDB,which can be used to build distributed databases but some extra code would have to be written to really make that happen). Users set up instances, then add and tag URLs. I don't know of any off the top of my head that would allow for rating URLs, though. - -- The Doctor [412/724/301/703] [ZS] PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F DD89 3BD8 FF2B 807B 17C1 WWW: https://drwho.virtadpt.net/ "You're breathing him." -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLHSFMACgkQO9j/K4B7F8GFfgCfYS6LdxuTqrgGOGkN3edY6k8T GTEAoOrnikxEnnLef8v2CavDCvgSafzm =FYcZ -----END PGP SIGNATURE-----

rysiek

5 Jan 5 Jan

6:29 p.m.

Dnia wtorek, 31 grudnia 2013 13:42:59 stef pisze:

...

On Sun, Dec 29, 2013 at 12:17:07AM -0800, Jesse R. Taylor wrote:

...
but I've seen very little focus on the need for development of alternatives to corporate search engines.

[disregarding the corporate focus] i can warmly recommend https://searx.0x2a.tk

Hummm, what is this, who runs this, is it distributed or centralised (as far as control is concerned)? -- Pozdr rysiek

rysiek

6:41 p.m.

Dnia niedziela, 5 stycznia 2014 19:29:23 rysiek pisze:

...

Dnia wtorek, 31 grudnia 2013 13:42:59 stef pisze:

...
On Sun, Dec 29, 2013 at 12:17:07AM -0800, Jesse R. Taylor wrote:

...
but I've seen very little focus on the need for development of alternatives to corporate search engines.

[disregarding the corporate focus] i can warmly recommend https://searx.0x2a.tk

Hummm, what is this, who runs this, is it distributed or centralised (as far as control is concerned)?

Disregard, I clicked the "about" link. -_-' -- Pozdr rysiek

4221

Age (days ago)

4241

Last active (days ago)

List overview

Download

22 comments

12 participants

participants (12)

Cathal Garvey
coderman
Eric Mill
freek2023＠yahoo.de
James A. Donald
Jesse R. Taylor
Lodewijk andré de la porte
realcr
rysiek
Sean Lynch
stef
The Doctor