Hey Twitter crowd … What else is there?

Journalists and news editors use Twitter to contextualize and enrich their articles by examining the public response, from comments and opinions to pointers to related news. This is possible because some users in Twitter devote a substantial amount of time and effort to news curation: carefully selecting and filtering news stories highly relevant to specific audiences.

We developed an automatic method that groups together all the users who tweet a particular news item, and later detects new contents posted by them that are related to the original news item.

We call each such group a transient news crowd. The beauty of this approach, in addition to being fully automatic, is that there is no need to pre-define topics and the crowd becomes available immediately, allowing journalists to cover news beats incorporating the shifts of interest of their audiences.

Transient news crowds
Figure 1. Detection of follow-up stories related to a published article using the crowd of users that tweeted the article.

Transient news crowds

We define the crowd of a news article as the set of users that tweeted the article within the first 6 hours after it is published. We followed users on each crowd during one week, recording every public tweet they posted during this period. We used Twitter data around news stories published by two prominent international news portals: BBC News and Al Jazeera English.

What did we find?

  • After they tweet a news article, people’s subsequent tweets are correlated to that article during a brief period of time.
  • The correlation is weak but significant, in terms of reflecting the similarity between the articles that originate a crowd.
  • While the majority of crowds simply disperse over time, parts of some news crowds come together again around new newsworthy events.

Crowd summarisation

We illustrate the outcome of our automatic method with the article Central African rebels advance on capital, posted on Al Jazeera on 28 December, 2012.

transient news crowds - example
Figure 2. Word clouds generated for the crowd on the article “Central African rebels advance on capital”, by considering the terms appearing in stories filtered by our system (top) and on the top stories by frequency
(bottom).

Without using our method (in the figure, bottom), we obtain frequently-posted articles which are weakly related or not related at all to the original news article. Using our method (in the figure, top), we observe several follow-up articles to the original one. Four days after the news article was published, several members of the crowd tweeted an article about the fact that the rebels were considering a coalition offer. Seven days after the news article was published, crowd members posted that rebels had stopped advancing towards Bangui, the capital of the Central African Republic.

You can find more details in our papers:

  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Transient News Crowds in Social Media. Seventh International AAAI Conference on Weblogs and Social Media, 2013, Massachusetts.
  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Finding News Curators in Twitter. WWW Workshop on Social News On the Web (SNOW), Rio de Janeiro, Brazil.


Janette Lehmann, Universitat Pompeu Fabra
Carlos Castillo, Qatar Computing Research Institute
Mounia Lalmas, Yahoo! Labs Barcelona
Ethan Zuckerman, MIT Center for Civic Media

About the author

Janette Lehmann

PhD student in Universitat Pompeu Fabra and working for Yahoo! Research with interests in Social Computing and User Engagement.

View all posts

1 Comment

  • I think it would be interesting to continue following the new crowd of users on the newer content to obtain a hierarchy of news articles that are related. Then further analysis on them can help to understand how news evolve, how people’s attention on different types of content evolve over time. Very interesting work, see you at ICWSM.