possible solution to cyber S/N
I've written many times on the problem of signal to noise here in cyberspace and have tried to think through some elegant solutions to this very vexing problem. moments ago I just came up with an intense brainstorm fully worthy of sharing with the list <g>. it's clear that the web functions very much like a directed graph in many ways. sites and traffic between them are one graph. however, another graph involves thinking of each page or "article" on the web as a single node in the graph, and hyperlinks as the edges between the nodes. what we have today is a directed graph in which one can follow the forward direction of hyperlinks very readily. however, it is much more difficult to follow the reverse direction, i.e. find "all the sites that reference this paper". of course this is part of the beauty of the web, that anyone can link to anyone without actually having to register or something like that. the "referrer log" feature on a server is a mechanism that does allow a server to get some idea of who is linked up to that site. the basic idea I have is that references are an excellent way of discriminating Signal to Noise and are used routinely in the scientific arena. a paper that is pivotal and influential is referred to ad infinitum. obscure papers are forgotten and never referred to in subsequent literature. taking this idea to the cyberspace arena, the application is immediately obvious-- pages that are linked to by a lot of other pages are valuable, those that are not are not as valuable. (another closely related idea is how much hit traffic a web site gets-- wouldn't it be a *tremendous* improvement in the current search engines if they returned the pages ranked according to how many hits they get per time?) there are some problems with all the above, however. currently only the site that actually houses the pages can keep track of hits, and referrers are not very well kept track of at all. in a robust system, cheating, such as reporting more hits than one is getting, would not be possible. anyway, I tend to think that future S/N problems in cyberspace are increasingly going to be solved in some particular ways that are just now being tried out: 1. rating agencies. agencies that both reject and find "cool stuff" (like Point Communications etc.) 2. hit statistics. how many people are hitting various pages? if we could get an arbitron-like system that works the same way that newsgroup readerships are now reported (each site compiles statistics and sends them to centralized databases) we could have search engines that rank results according to hit statistics. I'm not saying it would be perfect, and there are all kinds of obvious nitpicky things that people here will harp on, but I still insist it would be better than no statistics at all. eventually a system like this that involved voluntary submissions of hit counts to the centralized servers (or just some way of search engines to get hit statistics from servers about the pages they own/serve) may evolve into a more robust system that makes cheating impossible (such as falsely reporting a high hit count to attract people to the site). 3. linking statistics. again, I think this is an extremely powerful way of separating signal from noise-- how many other sites are linked to the page in question? if it has few links, it isn't as interesting, if a lot of other people point to it, it's far more interesting. I encourage system designers to keep some of these ideas in mind when they are working on the current generation of software tools such as search engines etc. in particular there is a tremendous amount of innovation going on right now between search engine designers, and adopting some of the above ideas into a search engine might be a very powerful way of distinguishing it from the "competition" by returning more useable search results to the end user. again, I suspect we will be seeing increasingly ingenious and efficacious ways of dealing with what today is the horrible S/N problem. perhaps today will be thought of as the dark age of cyberspace because of all the muck we are routinely wading through <g>
Vladimir Z. Nuri wrote:
taking this idea to the cyberspace arena, the application is immediately obvious-- pages that are linked to by a lot of other pages are valuable, those that are not are not as valuable.
The above is wrong from two standpoints: Locally: if my taste is sufficiently different from that of other people, why should I follow their preferences? Globally: Vladimir's recommendations lead to a vicious circle of "tyranny" of popular sites because hte more people view a site, the more will follow and the site's "attractiveness" rating will go even higher. It is similar to a recommendation to buy stocks when they are rising. If enough people follow it, stock crashes like in 1929 are imminent. - Igor.
Globally: Vladimir's recommendations lead to a vicious circle of "tyranny" of popular sites because hte more people view a site, the more will follow and the site's "attractiveness" rating will go even higher.
Nobody goes there anymore. It's too crowded. - Yoggi Bera ;)
Mr. Nuri wrote:
in the scientific arena. a paper that is pivotal and influential is referred to ad infinitum. obscure papers are forgotten and never referred to in subsequent literature. taking this idea to the cyberspace arena, the application is immediately obvious-- pages that are linked to by a lot of other pages are valuable, those that are not are not as valuable.
I'd bet that there are more links to playboy.com or "xxx.sex.com" than to thomas.loc.gov. This suffers from the fatal flaw of democracy, the assumption that one million people are smarter than 10. Who was it that said "No one ever went broke underestimating the intelligence of the average american"? Petro, Christopher C. petro@suba.com <prefered for any non-list stuff> snow@smoke.suba.com
participants (4)
-
ichudov@algebra.com -
Rabid Wombat -
snow -
Vladimir Z. Nuri