Personal Informatics to Encourage Diversity in News Reading

Today, people can choose from among more news sources than ever, some of which cater to particular ideological niches or which highlight items popular in one’s social network. Scholars and pundits express concerns this technology could reinforce people’s tendency to read predominantly agreeable news, through their own choices and the choices made by system designers. While this might increase short-term engagement and help people feel validated, it may not further either individual or societal goals. Reading broadly can further learning and out-of-the-box thinking. Individuals who are aware of other viewpoints can better communicate and empathize with those who disagree.

Despite theories that predict a preference for reading agreeable political news, many people appear to agree with the norm of reading diverse viewpoints, and at least some people actively prefer it. Colleagues and I were curious whether a personal informatics tool could help people identify when their behavior is inconsistent with this norm and help them take corrective action.

Balancer extension example
The Balancer extension gives readers feedback on the political lean of their online newsreading.

To test this, we built Balancer, an extension for the Chrome web browser. Balancer enables users to see patterns in their behavior and also reminds them of the norm of balance, the form a character on a tightrope. When the user’s newsreading is balanced, the character is happy; when the user’s newsreading is not, the character is in peril of falling.

In a one-month, open-enrollment controlled field experiment, this extension encouraged participants with unbalanced reading habits to make small but measurable changes in the balance of their newsreading. Compared to a control group, users receiving feedback from Balancer made 1-2 more weekly visits to a website with predominantly opposing views or 5-10 more weekly visits to a site with more neutral views.

We are working to improve its capabilities, and others are also making progress in this space. There are now many browser extensions and other tools that give people feedback about the news they read and sources they follow. ManyAngles recommends articles that cover different aspects of the topic about which a user is currently reading. Slimformation reveals topical diversity in one’s online news-reading. Scoopinion givers users feedback on their top authors, sources, and genres. Follow Bias shows people the gender (im)balance of their Twitter network. This is an exciting time for tools that help readers reflect on the news they read!

For more, see our full paper, Encouraging Reading of Diverse Political Viewpoints with a Browser Widget.

Sean A Munson, Human Centered Design & EngineeringDUB Group, University of Washington
Stephanie Y. Lee, Sociology, University of Washington
Paul Resnick, School of Information, University of Michigan

That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships

People post millions of updates to social media sites like Facebook and Twitter everyday. When it comes to understanding what groups of people are experiencing, knowing the area where these messages originate can make a huge difference: are  “I feel sick” posts pointing to a breaking epidemic or just run-of-the-mill flu? Does the posts promoting a political candidate reveal wide-spread support or just a loud minority form their home town?

However, one of the big challenges in doing geographic analyses is estimating
where people are. For example, only 0.7% of all Twitter messages come with some
kind of GPS data. Our work starts from this data and then uses a old social
principle: People are often friends with others who live nearby.  If we
know the locations of only a small number of people, we can look at a person’s
social network and try to infer their location based on where their friends are.

Here, take 20 million friendships in Twitter where we know the location of both people and plotted the relative geographic offset of each person's closet friend.  The giant spike right highlights that nearly all individuals have a very close friend!
Here, take 20 million friendships in Twitter where we know the location of both
people and plotted the relative geographic offset of each person’s closet
friend. The giant spike right highlights that most people have a
very close friend!

We looked at a Twitter social network of 47.7 million people, where two people are connected if they’ve both talked to each other at least once. In our estimates, we found that we could estimate a location for most people in the network (95%) and that our estimates were often very close to where people actually were, with over half within 10km (6mi).  Moreover, our method enables geo-tagging over 77% of all Twitter messages.

The error estimates for our method at inferring users locations.  If you pick a number of the x-axis and look a line, the y-axis shows the probability that an location estimate is less than the distance on the x-axis
The error estimates for our method at inferring users locations. If you pick a number of the x-axis and look a line, the y-axis shows the probability that an location estimate is less than the distance on the x-axis.

In the paper, we examined many other hypotheses and found:

  • The method works regardless of the person’s countries of origin
  • Locations can be accurately inferred across a variety of social network sizes – even if they only have one friend
  • Locations can even be inferred using data from other social network like Foursquare, provided you can find individuals who have identities in both
  • Only a small amount of location data is needed

For more, see our full paper, That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships
David Jurgens, Sapienza University of Rome

Properties, Prediction, and Prevalence of Useful User-generated Comments for Descriptive Annotation of Social Media Objects

User-generated comments in online social media have recently been gaining increasing attention as a viable source of general-purpose descriptive annotations for social media objects like online shared photos or videos. However, the quality of user-generated comments varies from very useful to entirely useless; comments can even be abusive or off-topic.

The most common methods for estimating the usefulness of user-generated comments simply allows all users to vote on (and possibly moderate) the contributions of others, thus avoiding an explicit definition of “useful”.

We investigate usefulness from the user’s perspective, defining a comment as USEFUL  if it provides descriptive information about the media object beyond the usually very short title accompanying it.

With this definition in hand, we asked:

  • What are PROPERTIES of useful comments?  
  • How to PREDICT useful comment?
  • How to estimate the PREVALENCE of useful comment?

Using Text-based, Semantic, Topical, and Author Features, we characterized crowd-sourced labeled comments on two classes of media objects (comments on Flickr photos and YouTube videos) and trained prediction models. Furthermore, an existing Bayesian Prevalence model is adapted that uses the learned prediction models to estimate the prevalence of useful comments among different platforms and topics.

We found that:

  • Properties of USEFUL comment varies slightly according the platform’s commenting culture and different topics of media objects. Comments that contain a higher number of references, a higher number of named entities, fewer self-references and less affective language are more likely to be inferred as USEFUL. Moreover, users express more emotion and may use more offensive language when writing comments about topics related to person and event.
  • Prediction performance is better when the classifier is trained on comments of a single topic, (type-specific), whereas performance is worse when the topic is ignored (type-neutral). Thus, for a more accurate prediction, a model should be trained that takes into account the topic of media objects.
  • Prevalence of USEFUL comments influenced by:
    • The time of the topic of media object being commented. The nearer the time period of a topic is to the present time, the lower the usefulness prevalence is.

    Rate-Time

    • The degree of polarization of topics among commenters.

    Rate-Polarization

    • The topic of the media object being commented and the platform’s commenting culture

    Rate

     

Want to learn more? see our full paperProperties, Prediction, and Prevalence of Useful User-generated Comments for Descriptive Annotation of Social Media Objects


Elaheh Momeni, University of Vienna
Claire Cardie, Cornell University
Myle Ott, Cornell University

Understanding the How and Why of Online Content Curation

According to Rohit Bhargava, content curation describes the act of finding, grouping, organizing or sharing the best and most relevant content on a specific issue.

On Pinterest (Fig.1), arguably the most popular picture curation website, users can collect and categorize images from other websites by pinning them onto so-called “pinboards”. Users can also repin or like images imported by other users. Last.fm has supported pinterest-like curation actions for over 7 years. It allows users to tag (similar to pinning onto a pinboard) or love (similar to like) music they listened.

content-curation-fig1
Fig. 1: Curation activities in Pinterest

In this study, we seek to have a first look and understand the how and why of content curation. Based on Pinterest and Last.fm, our analyses employ both quantitative and qualitative approaches. The quantitative study is based on a dataset that includes one-month of curation activities and social graphs on both websites. Our qualitative user study is based on responses from nearly 300 users.

Our main results are as follows:

Why people curate

1. Curation highlights new kinds of content

Curation-based ranking is quite different from traditional popularity and search ranking (Fig. 2). For example, websites that get a lot of repins or likes in Pinterest are not highly ranked by Alexa traffic ranking or Google PageRank. This is consistent with Clay Shirky’s theory that “curation comes up when search stops working”.

content-curation-fig2
Fig. 2: Correlation coefficients between curated-based ranking and traditional popularity ranking

2. Different views on social connectivity

Another aspect of Clay Shirky’s theory is that the job of curation is to synchronise a community. An evidence for this has been found in our research, which shows that most of curation actions focus on few items. However, our user studies show that most users curate for personal reasons rather than for social good. For example, a popular view is that:

I find the social aspect more useful and interesting with people I know, rather than developing new interactions based on music taste.

How people curate

To understand different content curation actions in different websites, we distinguish them into two classes:

  • Structured curation: categorizing items along with other similar items (e.g., tag on Last.fm or repin on Pinterest).
  • Unstructured curation: highlighting or collecting items without categorizing (e.g., like on Pinterest or love on Last.fm).

Based on this classification, several observations can be made:

  1. Some users prefer structured, others unstructured.
  2. For most items, unstructured curation accumulates faster than structured actions.
  3. However, popular items see more structured actions than unstructured ones (Fig. 3). i.e., even the top liked items have more repins than likes on Pinterest, and similarly top loved items on Last.fm has more tags than loves.
  4. Most curation happened on first listen (on last.fm).
content-curation-fig3
Fig. 3: Top items get more unstructured curation than structured curation. Magenta line indicates R = 0.5, which means unstructured curation equals to structured curation. If R < 0.5, the number of unstructured curation is large than structured curation. Observe that even the top loved or likes tracks have R < 0.5.

What adds social value

Our analysis on this question shows that users who are consistent in curating items and have a diversity of interests get more followers (Fig. 4). This agrees with Rohit Bhargava’s theory that the most important part of a content curator’s job is to continually identify new content for their audience. An interesting thing that can be observed in Fig 4a is that too short intervals between curating in Pinterest could detract followers. We conjecture that such behavior may be seen as spam.

content-curation-fig4
Fig. 4: Consistent updates and diversity of interests would attract more followers

For more, see our full paper, Sharing the Loves: Understanding the How and Why of Online Content Curation.

Changtao Zhong, King’s College London
Sunil Shah, Last.fm
Karthik Sundaravadivelan, King’s College London
Nishanth Sastry, King’s College London

What Might Yelp’s “Fake Review Filter” be Doing?

fake_review_yelp
Yelp’s message for users submitting fake reviews:
Image source: http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-reviews.html 

Fake Reviews and Yelp’s take on them

Fake reviews of products and businesses have become a major problem in recent years. As the largest review site, Yelp has been tasked with filtering fake/suspicious reviews on a commercial scale. However, Yelp’s algorithm for doing so is a trade secret.

Our work aims to understand what Yelp’s filter might be looking for, by exploring the linguistic and behavioral features of reviews and reviewers.

1.    Detection based on linguistic features:

Prior research in [Ott et al., 2011; Feng et al., 2012] showed that classification using linguistic features (i.e., n-grams) can detect crowd-sourced fake reviews (using Amazon Mechanical Turk) with 90% accuracy.

Applying the same approach on Yelp’s real-life fake review dataset (using filtered as fake and unfiltered as non-fake reviews) however yields only 68% detection accuracy. We analyzed fake and real reviews to understand the reason for this difference in accuracy finding that:

  • Turkers’ probably did not do a good job at Faking!
  • Yelp Spammers are smart but overdid Faking!

2.  Detection based on behavioral features

Prior work in [Jindal and Liu, 2008; Mukherjee et al., 2012] showed that abnormal behavior features of reviewers and their reviews are effective at detecting fake reviews: Abnormal behavioral features yielded 83% accuracy on the Yelp fake review dataset.

Below we show the discriminative strength of several abnormal behaviors (MNR: Maximum number of reviews per day, PR: Ratio of positive reviews, RL: Review length, RD: Rating deviation, MCS: Maximum content similarity).

behaviors_cdf

Summary of Main Results

Yelp, arguably, does at least a reasonable job at filtering out fake reviews, based on four pieces of evidence:

  1. Classification under balanced class distribution gives an accuracy of 67.8%, which is significantly higher than random guessing of 50% showing linguistic difference between filtered and unfiltered reviews
  2. Using abnormal behavioral features render even higher accuracy. It is not likely for a genuine reviewer to exhibit these behaviors.
  3. Yelp has been doing industrial scale filtering since 2005. It is unlikely that their algorithm is not effective.
  4. We are aware of cases where people who wrote fake reviews were caught by Yelp’s filter. Although these evidences are not conclusive, they are strong enough to render confidence that Yelp is at least doing a reasonable job at filtering.

How does Yelp Filter? From our results, we can speculate that Yelp might be using a behaviorally-based approach for filtering.

Amazon Mechanical Turk (AMT) crowd-sourced fake reviews may not be representative of commercial fake reviews as Turkers may not have genuine interests in writing fake reviews like commercial fake reviewers.

For more, see our full paper, What Yelp Fake Review Filter Might Be Doing?

Arjun Mukherjee, University of Illinois at Chicago
Vivek Venkataraman, University of Illinois at Chicago
Bing Liu, University of Illinois at Chicago
Natalie Glance, Google

Emoticon Style: Interpreting Differences in Emoticons Across Cultures

Facial expressions can sometimes tell more about the minds of others than words. According to Mehrabian, body language and nonverbal cues are in fact essential in the communication of feelings and attitudes as an expected 93% of the communication is nonverbal.

smiley

In text-based communication, however, these cues are not present and their absence can result in misunderstanding and confusion. Therefore, people started to express their facial expressions pictorially through groupings of symbols, letters, and punctuation; what are popularly referred to as emoticons. We focused on the use of these representative nonverbal cues online content and asked: how do people use emoticons in online social media across cultural boundaries?

Utilizing a near-complete Twitter dataset from 2006 to 2009, which contains information about 54 million users and all of their public posts, we investigated the semantic, cultural, and social aspects of emoticon usage on Twitter.

We found that:

  • There are two kinds of emoticon styles, vertical and horizontal.emoticon_style
  • We identified a wide range of variations, involving more than 14K facial emoticons in tweets. For example, the basic smiley “:)” had several variations (e.g., adding nose, eye brows), which then slightly changed the meaning. Although emoticons are generally used in positive light contexts, the most popular ones were used with both positive (e.g., haha, smile) and negative (e.g., kill, freak) affects.
    Word clouds of the top 50 co-occurring affect words for popular six emoticons
    Word clouds of the top 50 co-occurring affect words for popular six emoticons
  • Emoticons are expressed differently across cultural boundaries defined by geography and language. While Easterners employ a vertical style like “^_^”, Westerners employ a horizontal style like “:-)”. An important factor determining emoticon style is language rather than geography.
    Different emoticon rates of each country for horizontal and vertical style
    Different emoticon rates of each country for horizontal and vertical style
  • Emoticons diffuse through the Twitter friendship network. Twitter users may influence their friends to adopt particular styles of emoticons especially for less popular emoticons like “:P”‘, “^^”, and “T_T”. The diffusion occurs almost entirely between people from similar cultural backgrounds.

What is your favorite emoticon? What kinds of words do you use with emoticons?

As socio-cultural norms, emoticons not only express specific emotions they may also show your identity and cultural backgrounds.

For more detail, see our full paper, Emoticon Style: Interpreting Differences in Emoticons Across Cultures.

Jaram Park, Graduate School of Culture Technology, KAIST
Vladimir Barash, Morningside Analytics
Clay Fink, Johns Hopkins University Applied Physics Laboratory
Meeyoung Cha, Graduate School of Culture Technology, KAIST

A Measure of Polarization on Social Media Networks based on Community Boundaries

Many societal issues lend themselves to polarization: same-sex marriage, abortion, and gun control, for instance, are all topics that induce antagonism and opposition among people. Given a social network and its division into communities, we ask the question, how can we identify if there is polarization (i.e. segregation) among the sub-groups? This answer is important not only from a sociological standpoint, but also because polarization can be a key piece of information for tasks such as opinion analysis, as conflicting groups may carry biased opinions.

While this is certainly a salient question, we demonstrate that it has not been properly addressed by prior research for a simple reason: most prior research examine networks emerging from topics which are previously known to induce polarization in communities (especially Politics). To really understand the structural characteristics of polarized networks, we need to compare them against non-polarized networks. This is one of the key contributions of our work.

Modularity is a metric widely employed to measure separation in a social network; it roughly characterizes the strength of a division of a community into groups. We find that it is not a good metric to measure polarization because it can’t represent whether this division is a result of homophily within groups or antagonism between groups (or both).

polarization
A polarized network of political discussion in Twitter (right) is more modular than a Facebook friendship network divided into two communities (graph on the left – college and high school friends), but how ‘much’ modularity implies polarization?

In the two networks above, the separation between the two groups achieves a modularity score of 0.24 for the first network and 0.42 for the second. The ‘more’ divided group, a friendship network, however, has no polarization at all! These are two groups that share different interests without truly opposing one another. In this way, modularity is not effective for differentiating polarization from the absence of polarization!

To tackle this issue, we propose a new measure of polarization that, unlike modularity, focuses on the existence (or absence) of antagonism between the groups. We compare nodes’ propensity towards connecting to users in the other (potentially opposing) group to their propensity to connecting to members within their own group – we thus measure antagonism by looking to see if members of a group avoid connecting with members of the other group.

We show the usefulness of our novel metric on the analysis of the social network of retweets that emerged during the gun control debate due the shootings on Newtown, CT, on December 2012.

Want to learn more? See our full paper: A Measure of Polarization on Social Media Networks based on Community Boundaries

Pedro Calais Guerra, UFMG, Brazil
Wagner Meira Jr., UFMG, Brazil
Claire Cardie, Cornell University
Robert Kleinberg, Cornell University

On the Interplay between Social and Topical Structure

Your friends and your topics of interests are intuitively related – people form friendships through mutual interests and at the same time people discover new interests through friends. We are interested in exploring the ways in which social and topical structures can predict each other. We ask two basic questions:

  1. How well can a person’s topical interests predict who her friends are?
  2. How well can the social connections among the people interested in a topic predict the future popularity of that topic?

In order to answer these questions we study 5 million Twitter users. We study their hashtag usage to identify topical interests and their follower/@-messages to identify two different kinds of social relationships.

To predict whether two users have a social relationship based on their hashtags, we use logistic regression models trained on a wide range of distance measures, measuring topical similarity. Interestingly, one of the most predictive measures is also one of the simplest ones to compute: the size of the smallest hashtag shared by the two users.

Our full model has an accuracy of 77% when predicting follower relationships and 86% when predicting @-message relationships. We also find that predicting strong ties is much easier that predicting weak ties. Our model achieves an accuracy of up to 98% when predicting the strongest pairs, which exchanged more than 20 @-messages.

Linkage probability as a function of smallest common hashtag. (a) The probability of a given user following another user as a function of the size, and (b) the probability of a given user @-messaging another user as a function of size. Both figures are shown on a log-log scale.
Linkage probability as a function of smallest common hashtag. (a) The probability of a given user following another user as a function of the size, and (b) the probability of a given user @-messaging another user as a function of size. Both figures are shown on a log-log scale.

We also predict the future popularity of a hashtag from the social relationships of its early adopters. In particular, we predict whether a hashtag will double in size, studying only the social connections among the early adopters of the hashtag. Our intuition is that when the early adopters of a hashtag are very well connected, the hashtag is exhibiting high virality as it spreads quickly through the network, destined to become popular, notably, #tcot (top conservatives on Twitter), #tlot (top libertarians on Twitter). On the other hand, if the early adopters are all nearly disconnected, the hashtag is likely to be related to a popular topic exogenous to Twitter and likely to become popular on Twitter as well, #michaeljackson as an example. We find evidence that indeed hashtags with well-connected or well-disconnected early adopters are more likely to become popular than those in between.

Probability that hashtags will exceed K adopters given the number of edges in the graph induced by the 1000 initial adopters, using a sliding window. From top to bottom, K = 1500, 1750, 2000, 2500, 3000, 3500, 4000.
Probability that hashtags will exceed K adopters given the number of edges in the graph induced by the 1000 initial adopters, using a sliding window. From top to bottom, K = 1500, 1750, 2000, 2500, 3000, 3500, 4000.

The full model, which includes features such as the number of edges, connected components, and number of singletons in the set of early adopters, achieves an accuracy of 67% in predicting whether a hashtag will double its size.

For more, see our full paper, On the Interplay between Social and Topical Structure.
Daniel M. Romero, Northwestern University
Chenhao Tan, Cornell University
Johan Ugander, Cornell University

Warning: People you know may be hazardous to your cognitive health

The friendship paradox states that your friends have more friends than you do, on average. This statistical curiosity leads to systematic biases in perception and self-assessment. In our ICWSM 2013 paper, “Friendship Paradox Redux: Your Friends are More Interesting Than You,” we reveal that, not surprisingly, this paradox also exists in the follower graph of Twitter, in a variety of incarnations. Not only are the people you follow more popular (have more followers) than you, but they are also better connected (follow more users) than you. At the same time, your followers are also more popular and better connected than you are, on average.

In addition to these, we discovered two new behavioral paradoxes on Twitter. First, people you follow receive more viral content than you, on average (virality paradox). Also, they are more active than you,  meaning they tweet more often, on average, than you do (activity paradox).

These paradoxes have surprising implications for active users who rely on Twitter to keep up with friends and spread information to their followers.

Your friends see more valuable content than you do: Due to their better connectivity, people you follow tend to receive more valuable information, or at least information that ends up spreading farther, than you do.

Friendship Paradox Redux
Distribution of average popularity of information seen by overloaded and non-overloaded Twitter users. Popularity of information is defined as the number of people who have tweeted about it. Overloaded users tend to see only information that becomes popular.

Information overload: The more people you follow, the more information you will receive.  Due to the activity paradox, however, the volume of new information increases ever faster as you follow more people.  Because your ability to digest new information is limited, you risk becoming overloaded with content.

The last to know: Overloaded people tend to see only popular information that have been tweeted by many people they follow. They also risk missing updates from friends.

In order to absorb the content in their Twitter feeds, users will have to be more selective about whom they follow, and systematically refuse new “who to follow” suggestions Twitter makes. Conversely, to make themselves heard above the noise, users will either have to drown out the competition, exacerbating the problem, or find direct paths to users with the fewest friends—those who are most likely to see information in their feed and absorb it.

For more, see our fullpaper, Friendship Paradox Redux: Your Friends are More Interesting Than You.

Nathan O. Hodas, Information Sciences Institute, USC
Farshad Kooti, Information Sciences Institute, USC
Kristina Lerman, Information Sciences Institute, USC

 

#Bigbirds Never Die!

As some of you may remember, during the 2012 Presidential Election debates, there was a great deal of discussion over social media when Mitt Romney mentioned he would cut funding for the popular PBS children’s television show “Sesame Street”. The emotional response to this proposal and the imperiled future of Big Bird and friends gave rise to the hashtag #bigbird, among others.

bigbird
Big Bird related hashtags emerged during the 1st presidential debate.

For Twitter users, hashtags serve as ubiquitous and flexible annotations. They allow users to track ongoing conversations, signal membership in a community, or communicate non-verbal cues like irony. Hashtags often reflect eccentric topics and their emergence is happenstance. These idiosyncrasies typically limit researchers’ ability to systematically compare features of their emergence and evolution.

However, during the U.S. presidential debates, the candidates’ unscripted statements provides a unique opportunity to understand the social dynamics of novel hashtag adoption following exogenous events. Combined with fine-grained data about large scale and real time user behavior, these exogenous shocks create a set of natural experiments to systematically analyze

the features that contribute to the growth and sustainability of hashtags.

Two distinct classes: "winners" that emerge more quickly and are sustained for longer periods of time than other "also-rans" hashtags.
Two distinct classes: “winners” that emerge more quickly and are sustained for longer periods of time than other “also-rans” hashtags.

FINDINGS. We examine the growth and persistence in the context of the most popular novel hashtags during the 2012 U.S. presidential debates. Our analysis reveals the trajectories of hashtag use fall into two distinct classes: “winners” that emerge more quickly and are sustained for longer periods of time than other “also-rans” hashtags. Statistical analyses of the growth and persistence of hashtags reveal factors for relative success of hashtags:

  • Retweets and audience size contribute to faster hashtag adoption.
  • Replies and diversity support the hashtag persistence.
  • In addition, our findings unexpectedly suggest that the number retweets inhibit the persistence of hashtags, which suggests there are complex interactions that lead to limits or tipping points.

For more, see our full paper, #Bigbirds Never Die: Understanding Social Dynamics of Emergent Hashtags.
Yu-Ru Lin, Northeastern University
Drew Margolin, Northeastern University
Andrea Baronchelli, Northeastern University
David Lazer, Northeastern University