Friends, Romans, Countrymen: Lend me your URLs

Abhinay Nagpal, Computer Science Dept., Stanford University
Sudheendra Hangal, Computer Science Dept.,  Stanford University
Rifat Reza Joyee, Computer Science Dept., Stanford University
Monica S. Lam, Computer Science Dept., Stanford University

Human curation is useful for obtaining high-quality information. In fact, in the early days of the web, people found information using directories of useful pages. Over time, we’ve moved to search engines that use algorithms to infer relevant and authoritative pages on the web. However, the commercial importance of search engines has meant that they face the problem of web spam. Bringing back elements of human curation may be one way to solve this problem. And who better than your friends to curate information for you! Such social curation also offers implicit personalization since people often share common interests and affiliations with their friends.

We exploit the fact that your social chatter already contains a list of sites that are useful to you. For example, people email or tweet about links they find interesting and would like to share with friends. In our research, we created customized search indexes for users that bias web search results towards domains present in their social chatter. We found that this approach is effective at combating web spam and delivering high quality search results. It also solves another problem with current search engines — user privacy. Search engines need to generate detailed profiles of users to deliver personalized results. This information includes the user’s social graph, location, etc. In contrast, our approach enables personalized search without revealing a lot of detail to the search engine. Moreover, this form of personalization can be better, since only the user has access to all his chatter — it is not limited by commercial arrangements between search engines and channels of social chatter.

High level workflow of Slant

We have developed a system called Slant that extracts URLs from email archives and Twitter feeds, and uses the domains therein to create a personalized Google Custom Search engine for each user.  This engine restricts search results to the domains mentioned in social chatter — in essence, these domains are treated as a whitelist. We evaluated the results from various personalized custom search engines, and found that, even though the personalized indexes used only a few thousand domains, their results as rated by users matched or exceeded the results from personalized Google search.

Specifically, in our study, we asked users to compare results from different search engines:

  • Google’s personalized results
  • Results from a custom search engine that had domains from TopTweets account
  • Results from a custom search engine that had domains from user’s Twitter account.
  • Results from a custom search engine that had domains from user’s email account.
  • Results from a search engine where we supplied user’s friend names and appended to the original query.

The results are shown below, and indicate that both the email and Twitter-based indexes frequently match or exceed personalized Google ratings.

We further categorized queries along Broder’s taxonomy, as one of transactional, navigational or informational, and obtained insights about which search indices do better for each category. Please see our paper for details.

A secondary benefit of Slant is that it lets users consume information implicitly, by piping the recommendations implicit in their social feeds into a search engine. This means that users can follow more people on Twitter, or subscribe to more mailing lists, without having to read all the content manually.

For more, see our full CSCW-2012 paper, Friends, Romans, Countrymen: Lend me your URLs. Using Social Chatter to Personalize Web Search. Interested readers can try out the Slant research prototype here.

About the author


View all posts