“How Old Do You Think I Am?”: A Study of Language and Age in Twitter

When you are reading someone’s tweets, you usually quickly get a impression of what kind of person is behind these tweets. For example, what kind of person do you think is behind the following tweet?


And what about this one?

Interesting article about usability design on mobile search [LINK]

(Scroll down to bottom to find out how old these persons are)

In our analysis, we wanted to find out if we can automatically predict the age of Twitter users by looking only at their language use. Being able to do so would give us more insight into how people are using language and allow us to conduct more fine-grained analysis of trends in social media.

While prior research has addressed similar questions, no clear consensus has evolved on the best way to model age using language. We therefore experimented with predicting age using three different methods: age categories (20-, 20-40, 40+), exact age (e.g. 23 years), and life stages (e.g. secondary school student, student, or employee).

Our Findings:

One of our most striking findings is that it is very difficult to predict the age of older Twitter users. Both humans and our predictive model tended to judge older Twitter users as younger than they were. We found that the variables we studied pertaining to language use show little change after about 30 years of age. The two figures below illustrate some examples of this phenomenon:

2nd-Person Pronouns
Usage of 2nd-Person Pronouns (e.g. ‘you’) remains constant after roughly 30 years of age. (Female = Red, Male = Blue).
Tweet Length
Tweet length similarly flattens after roughly 30 years of age for both genders.

Additional Findings:

  • We recommend modeling age based on life stages or exact ages rather than age categories.
  • When comparing the performance of an automatic system with that of humans, the automatic system was slightly better and much, much faster.
  • Not every X-year old sounds the same, people can emphasize different aspects of their identity. This makes age prediction harder.
  • Younger people talk more about themselves. Older people use longer words and longer tweets, and also use more links and hashtags.

Two months ago we launched an online demo based on this research. Unfortunately it’s only in Dutch and developed for Dutch Twitter users, but you can check it out on TweetGenie. Within a couple of days we already had tens of thousands of Dutch Twitter users trying out the system. We learned a lot from testing our system in the wild, so stay tuned ;).


Screenshot of our demo (fake account of Dutch King)

For more, see our full paper, How Old Do You Think I Am?”: A Study of Language and Age in Twitter.
Dong Nguyen, University of Twente
Rilana Gravel, Meertens Institute
Dolf Trieschnigg, University of Twente
Theo Meder, Meertens Institute

Examples beginning of post. First person 13 year old female, second person is 30 year old male

About the author


Dong Nguyen is a PhD student at the university of Twente. She is interested in analyzing language in social media to learn more about our culture.

View all posts