How can we better understand and measure the well-being of a community?
Polling organizations like Gallup ask questions such as “how satisfied are you with your life?“. Aggregating surveys over large samples of the US population, they can measure community well-being and relate it to other community data such as demographics or socioeconomic status (average age, income, education, etc.).
However, surveys are relatively expensive, and relating community well-being to things like socioeconomic status only gives us limited insight into what makes a thriving community.
We think Twitter can help! Tweeting is pervasive across the US and, unlike responses to surveys, tweets are not constrained to pre-chosen questions.
In a work we’re presenting at International AAAI Conference on Weblogs and Social Media (ICWSM) next week, we analyzed the language in twitter and found tweets reveal a lot about community well-being.
Using a dataset of millions of tweets with geo-location information, we built a model of language that predicts community well-being (as measured by Gallup polls). Our results?
- Twitter language alone was significantly predictive of well-being. This was the first work to show this.
- Socioeconomic information (as expected) was more predictive than language alone.
- However, the combination of twitter language and socioeconomic information was significantly more predictive than socioeconomic information alone.
Thus, tweets are capturing “something” above and beyond standard demographic and socio-economic indicators.
What are tweets capturing about well-being? We didn’t want to just wind up with a happiness score, so the bulk of our work looked into this question by observing the actual words people use in regions with differing levels of life satisfaction. We used a technique known at Latent Dirichlet Allocation (LDA) to group related words into 2000 “topics”. Then, we determined which topics most distinguished those communities with high and low well-being. The results are below.
Here, we see the influence of socioeconomic factors on community well-being, and more. For example, the topic about “money” is about philanthropy (“donate”, “charity”, “support”), and the topic about “business” is about development (“learning”, “skills”, “development”, “education”). We refer to this as greater “behavioral and conceptual resolution” — these results don’t just suggest money and business influences well-being, they suggest it is the donation of money and the development of skills for business that affect well-being.
Furthermore, we see some non-socioeconomic topics related to well-being. For example, topics relating to outdoor activities, spiritual meaning, and exercise were predictive of happy communities while topics of disengagement (“bored” and “tired”) distinguished communities with low well-being. These results support previous research and hypotheses on individual-level well-being. We look forward to a future of community research which leverages social media to capture greater behavioral and conceptual resolution than previously feasible.
For more, see our full paper, Characterizing Geographic Variation in Well-Being using Tweets
H. Andrew Schwartz*, Johannes C. Eichstaedt*, Margaret L. Kern, Lukasz Dziurzynski,, Megha Agrawal, Gregory J. Park, Shrinidhi K. Lakshmikanth, Sneha Jha, Martin E. P. Seligman, and Lyle Ungar, University of Pennsylvania
Richard E. Lucas, Michigan State University
*co-lead this work
This work was done as part of the World Well-Being Project (http://wwbp.org/).