Early adopters of Twitter and Google+: Validation of a theoretical model of early adopter personality and social network site influence

The widespread adoption of social media is transforming the consumer-brand relationship. Social media is allowing consumers connect with other users, create, consume and control access to content (Hoffman and Novak, 2012). Research suggests that social media increases brand relationship depth and loyalty, and generates incremental purchase behaviour (Laroche et al., 2012; Kim and Ko, 2012; Pooja et al., 2012). It is not surprising therefore that commentators suggest that marketers should target social media users who are more likely to exert an influence on their network in order to facilitate brand recommendations (Iyengar, Han, & Gupta, 2009). But who are these influentials? Goldenberg et al. (2009) suggest that there are only two types of influential that impact information diffusion – innovators and followers.

influence

Our study looks at early users or in Goldenberg at al.’s terminology, innovators, of two social networking sites, Twitter and Google+, and the effects of personality and mode of information sharing on social influence scoring. Specifically, we look at:

1. How does (i) extraversion, (ii) openness and (iii) conscientiousness influence:

  • Information sharing behaviour
  • Rumour sharing behavior

2. How does (i) information sharing behaviour and (ii) rumour sharing behaviour impact social network site influence scores?

Early Twitter users were identified through a public list and through the joining date listed on user public profiles. As the study occurred during the Google+ closed field test, all users were deemed early users. Two discrete survey instruments were designed, one for Twitter and one for Google+ to provide for different SNS validation checks. To assess the personality traits of respondents, we tested extraversion, openness and conscientiousness with the scale of Gosling et al. (2003) while information and rumour sharing scale were extracted from Marett and Joshi (2009). The SNS score was the independent variable in our model and this was measured using two commercial SNS influence score providers, PeerIndex and Klout.

Our study hypothesized that that Extraversion and Openness were two personality traits that should positively influence both Information and Rumor sharing behavior (H1 and H2), while Conscientiousness would have a reverse effect on Information (+) and Rumor (-) sharing behavior (H3 and H4). We also hypothesized that both Information and Rumor sharing behavior should positively influence social network influence scoring. A structural equation model using AMOS was used to test these hypotheses.

Results of Structured Equation Model - Standardised Regression Weights and Summary Findings
Results of Structured Equation Model – Standardised Regression Weights and Summary Findings

 

The model suggests:

  • Early users of social network sites who are more extrovert or more open or more conscientious are more likely to share information
  • Information sharing and rumor sharing should be treated as two distinct constructs in the discussion of social network influence.
  • All three traits were negatively related to rumor sharing. Only the effects of extroversion and conscientiousness were significant.
  • Both information sharing and rumor sharing impacted positively and significantly on social network site influence scores.

While previous literature has suggested that it is difficult to identify market mavens (Goldsmith et al., 2006), early users of social media can be identified easily and conveniently. This may provide firms with the opportunity to target potential innovators and early adopters much more efficiently and thus accelerate diffusion of marketing messages. Our study suggests filtering these adopters by messaging behaviour may also be of assistance with a greater of emphasis of resources being placed on those social network users who share information rather than rumor. While identifying these potential influencers would seem to be more efficient than identifying mavens, further research is required to understand the most effective way and time to engage with them. Finally, it would seem social network influence scores provide useful signals for identifying social media users likely to share information. Social media users characterised by a combination of high influence scores and propensity for information sharing are powerful assets for firms, particularly if they have relatively large social networks. Engaging with these influencers represents a relatively low cost mechanism for indirectly reaching target markets through word of mouth on social networks.

The research was conducted by Dr Theo Lynn (DCU Business School), Dr Laurent Muzellec (UCD), Dr Barbara Caemerrer (ESSCA), Prof. Darach Turley (DCU Business School) and Bettina Wuerdinger (DCU Business School).

A More Paradoxical Paradox

Have you ever checked your Facebook and Instagram and felt that your friends have more interesting lives? You’re not alone! In fact, that’s one of the consequences of Friendship Paradox, which states that on average, your friends have more friends than you do. Recently, researchers demonstrated that network paradoxes hold not only for popularity, but other traits as well, such as activity and virality of content received.

Beach
A variety of paradoxes exist in online social network such as Twitter and Facebook: Your friends, on average, have more friends, are more active, and post more popular/interesting content compared to you. Image source: https://flic.kr/p/5QXd9M

We recently showed that the standard version of the paradox, using the mean of friends’ values of the trait, arises trivially from the properties of statistical sampling from a heavy-tailed distribution. Social traits, such as popularity or activity (e.g., number of posts made), often have a “heavy tail”, where extremely large values, e.g., very popular people, appear much more frequently than expected compared to a normal distribution. When sampling randomly from such a distribution, the mean of the sample (i.e., mean of friends’ values) will grow with sample size, resulting in paradox. In contrast, the median of the sample does not behave this way and is a more robust measure of the paradox.

Surprisingly, paradoxes persist when median is used: i.e., most of your friends (and followers) have more friends (followers) than you do, and also post and receive more viral and diverse content than you do. In other words, the paradox holds not only for the mean, where a single very popular (or active) friend could skew the average, but also for most friends.

Why do strong paradoxes exist in networks? Since they are not a consequence of sampling, they must have behavioral origin. We hypothesize that they arise due to correlations between individual’s traits and popularity or between traits of connected people (homophily). To test this hypothesis, we performed the shuffle test: we kept the network topology fixed, but permuted traits between nodes in the network. This keeps the distribution of the traits intact, but destroys correlations between people. As expected, we still observe a paradox for the mean in the shuffled network, but not the strong paradox that uses the median.

In short, main findings of our work are

  • We found “strong” paradoxes where most of your friends have more friends than you do, etc.
  • We showed that the paradoxes have a behavioral origin, and not simply the result of statistical properties of sampling from the network.
  • The origin of the paradoxes is in the correlations between traits of nodes and their degree or homophily.

For details, please see our paper “Network Weirdness: Exploring the Origins of Network Paradoxes” http://arxiv.org/abs/1403.7242

Farshad Kooti, University of Southern California
Nathan O. Hodas, USC Information Sciences Institute
Kristina Lerman, USC Information Sciences Institute

Discussion Graphs: Better social media analysis through tools

Capturing and exploring the context of social media discussions is critical to understanding the relationships and information we extract from them. Knowing where information comes from helps us interpret it more correctly and knowing how our extracted results change as we condition on context provides insights into the underlying phenomena and can suggest further lines of investigation and action. For instance, the Livehoods project demonstrated how we can calculate a social distance between locations and use this to map out neighborhood boundaries. Extending this analysis by including contextual factors, such as gender and temporal information, we find that these boundaries can shift, sometimes significantly so. Consider the two maps below of neighborhood boundaries extracted from social data gathered on weekdays (Fig 1) and weekends (Fig 2), demonstrating clearly distinct mobility patterns, such as the appearance of a distinct “5th Ave shopping district” on weekends. Weekday_crop3

Figure 1. Neighborhood boundaries inferred from weekday behaviors

 Weekend_crop3

Figure 2. Neighborhood boundaries inferred from weekend behaviors

The social sciences have long taken the conditioning on demographic factors such as gender and socioeconomic status seriously. In fact, it’s commonly understood that incorrect conditioning can in some cases completely reverse empirical results (see for example, Simpson’s Paradox). So, why aren’t social media analyses more commonly conditioned on such demographic variables? We believe that much of the blame lies on the practical difficulties of implementing the necessary feature extractors, aggregators and analysis components. Moreover, without appropriate computational abstractions, much of this work must be adapted or re-created for each new data set and research question. To address this challenge, we are presenting a paper next week at ICWSM 2014, and releasing software that dramatically simplifies the implementation of co-occurrence analyses, a surprisingly common class of social media analyses.  At the core of our software are discussion graphs, a data model for representing and computing upon relationships extracted from social media. Discussion graphs capture both the structural features of relationships inferred from social media, as well as the context from which they are derived, in just a couple steps. First, feature extraction turns raw social media data into an initial discussion graph, where each node is an extracted feature-value, and hyper-edges connect all nodes that co-occurred together in a tweet.  Secondly, we project this graph to include only the relationships among target nodes, aggregating the remaining features as context annotating each relationship. Figure

Figure 3. Discussion graph framework

We implement a tool (cleverly named Discussion Graph Tool, or DGT) to easily build and manipulate discussion graphs. DGT enables sharing and re-use feature extractors (for example, to infer gender from names, or mood from language cues), extraction of relationships and information, and conditioning on context. With DGT, extracting relationships from social media, including the social distances that underlie the neighborhood boundary inference shown above, is a simple 4-5 line script. To learn more about discussion graphs and DGT, read our paper Discussion Graphs: Putting Social Media Analysis in Context, at ICWSM 2014, and look for our upcoming tool release at our project site, http://research.microsoft.com/dgt  This work is a collaboration between Emre Kıcıman, Scott Counts, Michael Gamon, Munmun De Choudhury and Bo Thiesson.

That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships

People post millions of updates to social media sites like Facebook and Twitter everyday. When it comes to understanding what groups of people are experiencing, knowing the area where these messages originate can make a huge difference: are  “I feel sick” posts pointing to a breaking epidemic or just run-of-the-mill flu? Does the posts promoting a political candidate reveal wide-spread support or just a loud minority form their home town?

However, one of the big challenges in doing geographic analyses is estimating
where people are. For example, only 0.7% of all Twitter messages come with some
kind of GPS data. Our work starts from this data and then uses a old social
principle: People are often friends with others who live nearby.  If we
know the locations of only a small number of people, we can look at a person’s
social network and try to infer their location based on where their friends are.

Here, take 20 million friendships in Twitter where we know the location of both people and plotted the relative geographic offset of each person's closet friend.  The giant spike right highlights that nearly all individuals have a very close friend!
Here, take 20 million friendships in Twitter where we know the location of both
people and plotted the relative geographic offset of each person’s closet
friend. The giant spike right highlights that most people have a
very close friend!

We looked at a Twitter social network of 47.7 million people, where two people are connected if they’ve both talked to each other at least once. In our estimates, we found that we could estimate a location for most people in the network (95%) and that our estimates were often very close to where people actually were, with over half within 10km (6mi).  Moreover, our method enables geo-tagging over 77% of all Twitter messages.

The error estimates for our method at inferring users locations.  If you pick a number of the x-axis and look a line, the y-axis shows the probability that an location estimate is less than the distance on the x-axis
The error estimates for our method at inferring users locations. If you pick a number of the x-axis and look a line, the y-axis shows the probability that an location estimate is less than the distance on the x-axis.

In the paper, we examined many other hypotheses and found:

  • The method works regardless of the person’s countries of origin
  • Locations can be accurately inferred across a variety of social network sizes – even if they only have one friend
  • Locations can even be inferred using data from other social network like Foursquare, provided you can find individuals who have identities in both
  • Only a small amount of location data is needed

For more, see our full paper, That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships
David Jurgens, Sapienza University of Rome

Towards Supporting Search over Trending Events with Social Media

Trending events are events that serve as novel or evolving sources of widespread online activity. Such events range from anticipated events to breaking news, and topics vary from politics to sporting events to celebrity gossip. Recently, search engines have started reflecting search activity around trending events back to users (e.g. Bing Popular Now or Google Hot Searches).

Real-time content published via social media can provide valuable information about time-sensitive topics, but the topics being discussed can change quite rapidly over time. In our analysis, we aimed to answer the following questions: For what types of trending events will real-time information be useful, and for how long will it continue to align with the information needs of users searching about these events?

Figure: Information Types
We surveyed 288 users about their experience with trending events over a week in August 2012. Among other things, they reported on the utility of various types of information when making sense of such events; here, we see how important real-time information is to the users surveyed.

In order to identify ways to better support users issuing such queries, we examined hundreds of trending events during the summer of 2012, using three sources of data: (1) qualitative survey data, (2) query logs from Bing, and (3) Twitter updates from the complete Twitter firehose.

Our findings revealed that:

  • Searchers who click Trending Queries links engage less and with different result content than users who search manually for the same topics. This may be due to a preference for real-time information that is perhaps not currently being satisfied.
  • Search query and social media activity follow similar temporal patterns, but social media activity tends to lead by 4.3 hours on average, providing enough time for a search engine to index and process relevant content.
  • User interest becomes more diverse during the peak of activity for a trending event, but a corresponding increase in overlap between content searched and shared highlights opportunities for supporting search with social media content.
Search vs. Social Media Delays - Histogram
Each data point in this histogram corresponds to a single trending event. The value represents the delay between patterns of query activity and social media activity (negative values indicate that social media precedes search). The dotted red line shows the mean h = -4.3 hours.

Many current search interfaces leveraging social media content tend to provide a reverse-chronologically ordered list of keyword-matched updates. Our finding that search activity often lags behind social media activity means that there may be time for more complex indexing and ranking computation to present more relevant “near-real-time” content in search results.

For more about our study and implications for supporting search over trending events, see our full paper, Towards Supporting Search over Trending Events with Social Media.
Sanjay Kairam, Stanford University
Meredith Ringel Morris, Microsoft Research
Jaime Teevan, Microsoft Research
Dan Liebling, Microsoft Research
Susan Dumais, Microsoft Research

Perception Differences in Twitter

One’s state of mind will influence his or her perception of the world and people within it.

WHY?

depression_stat

1. Depression is a serious social problem
  The most commonly diagnosed mental disorder
  Reported $193 billion annually in lost earnings
  Much effort has been made towards the early diagnosis and prevention of depression

2. Social media has gained significant attention as an innovative avenue for combating the stigma and for providing potential interventions. So, a number of possibilities for detecting depressed users in social media were tested:
  by capturing prevalence of Major Depressive Episodes [Moreno et al. 2011]
  by capturing sentiments differences [Park, Cha, and Cha 2012]
  by capturing patterns of application usage [Kotikalapudi and Lutzen 2012, Munmun et al. 2013]

Future tools that help depressed people would benefit from methods for identifying users suffering from depression. But as well, social media systems should not simply stop at detection!

 

OUR STUDY

Addressing the needs of depressed individuals will require understanding how differently they perceive and use social media. So, we conducted in-depth interview with 7 depressed and 7 non-depressed participants who are active users in Twitter, and iterative qualitative analysis (and analysis on some triangulating quantitative data). We identified interesting perceptional and behavioral differences between those of participants:

 Participants with depression
perceived Twitter as a tool for social awareness and consoling oneself
preferred to read others’ daily lives with emotional content
sensitive about the implications of social interactions such as ‘mention’
careful in controlling the type of sentiments they received from peers

 Participants without depression
perceived Twitter as an information consumption and sharing tool
keen to manage the amount of information they received
more willing to participate in active social interactions
retweeting provided a mechanism for curating or archiving tweets that have potential value in the future

For more details and some design implications, see our full paper, Perception Differences between the Depressed and Non-depressed Users in Twitter or contact one of the authors:

Minsu Park, KAIST
David W. McDonald, University of Washington
Meeyoung Cha, KAIST

Memes and Cultural Organisms

In biology, the fundamental building blocks of complex organisms like ourselves are replicating DNA segments called genes. A cultural theory, called memetics, states that there also exists fundamental building blocks in culture, and this blocks are known as memes.

We study memes from an websites where users can create, combine, evolve and extinct memes, called Quickmeme.com. Two examples of memes are:

Screenshot from 2013-05-17 15:15:43
Two examples of similar memes that user created in the website Quickmeme.com

The current studies about memes focus on social networks. They are interested in understanding how the social dynamics affect the spread of memes in the human minds.

We, instead, study the direct interactions between memes, seeking their fundamental characteristics, without looking at social networks. We analyzed hundreds of such memes, with tens of thousand variations created by users. Our results show that:

  • Just like genes, we are able to prove that memes competes one against the other.
  • Just like genes, we also find traces of collaboration between memes.
  • Collaboration do no end in simple pairs of memes. Memes clump up and literally form cultural organisms.

What does it mean? It means that it is possible that the complex culture we live in (songs, books, cathedrals and so on) is the result of dynamics that closely resemble the ones among genes in the primeval broth.

We are able to define some characteristics of memes. They can be prone to competition or collaboration. Or they can bring the collaboration to a next level and create a large cluster of collaborating memes: a cultural organism.

Using these characteristics, we are able to predict if the meme will be a successful meme or not. A successful meme is a meme that is preserved in the minds of the users of the websites, and they use it often.

memetree
The decision tree describing the odds of success of memes, given their characteristics.

In the picture, we report a visualization about it. You can read the probabilities of success and the characteristics of the memes. Memes are successful in 35.47% of the cases. But lower popularity peaks, high competing strategies and being in a meme organism raise this probability up to 80.3%.

For more, see our full paper, Competition and Success in the Meme Pool: a Case Study on Quickmeme.com. You can also check further information on my website: www.michelecoscia.com.

Michele Coscia, CID – Harvard University

Hey Twitter crowd … What else is there?

Journalists and news editors use Twitter to contextualize and enrich their articles by examining the public response, from comments and opinions to pointers to related news. This is possible because some users in Twitter devote a substantial amount of time and effort to news curation: carefully selecting and filtering news stories highly relevant to specific audiences.

We developed an automatic method that groups together all the users who tweet a particular news item, and later detects new contents posted by them that are related to the original news item.

We call each such group a transient news crowd. The beauty of this approach, in addition to being fully automatic, is that there is no need to pre-define topics and the crowd becomes available immediately, allowing journalists to cover news beats incorporating the shifts of interest of their audiences.

Transient news crowds
Figure 1. Detection of follow-up stories related to a published article using the crowd of users that tweeted the article.

Transient news crowds

We define the crowd of a news article as the set of users that tweeted the article within the first 6 hours after it is published. We followed users on each crowd during one week, recording every public tweet they posted during this period. We used Twitter data around news stories published by two prominent international news portals: BBC News and Al Jazeera English.

What did we find?

  • After they tweet a news article, people’s subsequent tweets are correlated to that article during a brief period of time.
  • The correlation is weak but significant, in terms of reflecting the similarity between the articles that originate a crowd.
  • While the majority of crowds simply disperse over time, parts of some news crowds come together again around new newsworthy events.

Crowd summarisation

We illustrate the outcome of our automatic method with the article Central African rebels advance on capital, posted on Al Jazeera on 28 December, 2012.

transient news crowds - example
Figure 2. Word clouds generated for the crowd on the article “Central African rebels advance on capital”, by considering the terms appearing in stories filtered by our system (top) and on the top stories by frequency
(bottom).

Without using our method (in the figure, bottom), we obtain frequently-posted articles which are weakly related or not related at all to the original news article. Using our method (in the figure, top), we observe several follow-up articles to the original one. Four days after the news article was published, several members of the crowd tweeted an article about the fact that the rebels were considering a coalition offer. Seven days after the news article was published, crowd members posted that rebels had stopped advancing towards Bangui, the capital of the Central African Republic.

You can find more details in our papers:

  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Transient News Crowds in Social Media. Seventh International AAAI Conference on Weblogs and Social Media, 2013, Massachusetts.
  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Finding News Curators in Twitter. WWW Workshop on Social News On the Web (SNOW), Rio de Janeiro, Brazil.


Janette Lehmann, Universitat Pompeu Fabra
Carlos Castillo, Qatar Computing Research Institute
Mounia Lalmas, Yahoo! Labs Barcelona
Ethan Zuckerman, MIT Center for Civic Media

What do users really want in an event summarization system?

The wide usage of social media means that users now have to keep up with a large number of incoming content, motivating the development of several stream monitoring tools, such as PalanteerTopsyTweet Archivist, etc. Such tools could be used to aid in sensemaking about real-life events by detecting and summarizing social media content about these events. Given the large amount of content being shared and the limited attention of users, what information should we provide to users about special events as they are detected in social media? 

In our analysis, we analyzed tweets related to four diverse events:

  1. Facebook IPO
  2. Obamacare
  3. Japan Earthquake
  4. BP Oil Spill

The figure below shows the temporal patterns of usage for words related to the Facebook launch price. By exploiting the content similarity between tweets written around the same time, we could discover various aspects (topics) of an event.

Facebook IPO Launch Price
These plots show frequency of usage over time for various words related to the Facebook IPO. We can see similarities and differences in the temporal profiles of the usage of each of these words.

The figure below shows how the volume of content related to various aspects (topics) of an event changes over time, as the event unfolds. Notice that some aspects have a longer lifespan of attention from tweeters, while others peak and die off quickly.

Topics through time
These two figures show how the topics within an event change over time. The figure on the left shows raw volumes, while the figure on the right shows underlying patterns used in our model. Notice how topics spike at different times and with different amounts of concentration over time.

We used our model to generate summaries and hired workers on Amazon Mechanical Turk to provide feedback. Please refer to this link for the summaries we showed to our workers. Which summary do you like best? This is what some of our respondents had to say:

  1. Number 3 has the most facts.
  2. Summary 2 is more straight forward information & not personal appeal pieces like live chats and other stuff with people who are unqualified to speak about the issue.
  3. None. All too partisan
  4. Summary 3 has most news with less personal commentary than the others.
  5. I believe that summary 1 and 2 had a large amount of personal opinion and not fact.
  6. I think summary 3 best summarize Facebook IPO because it shows a broad range of information related to the event.
  7. Summary 3 is more comprehensive and offers better overall summary.

Overall, we received feedback from users that they want summaries that are comprehensive, covering a broad range of information. Furthermore, they want summaries to be objective, factual, and non-partisanWhile we believe we have done well in giving users comprehensive and broad range information, we think that future work in summarization will reduce the gap between what researchers are doing and what users really want.

For more, see our full paper,  Automatic Summarization of Events from Social Media.
Freddy Chua, Living Analytics Research Centre, Singapore Management University
Sitaram Asur, Social Computing Research Group, Hewlett Packard Research Labs

Is social media diversifying public agendas?

Agenda-setting (telling people ‘what to think about‘, in McCombs’ and Shaw’s terms) is a key role played by the mainstream media. As the public continues to consume an increasing amount of its news through alternative channels, such as social media, it raises the interesting question how agenda-setting in these channels differs from that in traditional sources. We might hypothesize that agendas are set in social media in a more diverse and democratized manner.

To explore whether social media is diversifying the agendas of public discourse, we conducted a study aimed at comparing content discussed over Twitter during the Korean 2012 general election period with that being discussed over that in the mainstream news media.

First, we found that stories circulated in social media tended to concentrate on a smaller number and narrower range of topics.

  • Three or four news stories received more than 70% of the tweets each day
  • The topical category of these stories were mostly limited to ‘politics and government’ while salient agendas of a mainstream outlet ranged over ‘environment/food’, ‘foreign policy’, ‘welfare’, etc.
Bar Chart - # topics discussed by media
This chart shows the number of identified topics discussed in each medium on each day. The diversity of topics discussed in social media was less than that of traditional outlets.

We further observed that attention in social media tended to skew to a few famous political figures and national issues. To better understand this skew in discussion, we looked into how content was disseminated, observing that:

  • The imbalance of circulation begins from the start (the stories that ultimately become popular are imported to Twitter much more frequently than others).
  • The circulation further narrows down to a few items due to the strong retweet tendency toward popular topics.
  • People who show this tendency have a strong impact in deciding the salient topics not only because they have better sense of potential popular items but also maintain relationships through retweeting others.

On the other hand, we also observed some possibility of diversification through an analysis of individual users.

  • There is a small segment of people who circulate a broad range of topics.
  • These people also share some interest and maintains relationship with the majority.

For more, see our full paper, Agenda Diversity in Social Media Discourse: A Study of the 2012 Korean General Election.

Souneil Park, Univ. of Michigan
Minsam Ko, Jaeung Lee, Junehwa Song, KAIST