Twitter has always been associated with brevity and immediacy, and there is little wonder that the combination of these two has led to a preconceived notion that linguistic style on Twitter is closer to the “lingo” used on informal mediums such as SMS and IM. However, the purpose of Twitter as a conduit for discussion and news dissemination also puts forward the possibility that its language may really be a length restricted version of the language found on more formal mediums such as magazines and newspapers.
To address this debate, we collected large corpora of data from Twitter and other mediums such as email, SMS, new magazines, and tried to answer the question:
Is the language of Twitter closer to informal media such as SMS and IM, or does it share similarities with more curated media like newspapers and magazines?
In the first part of our analysis, we considered the following common grammatical elements to quantify linguistic styles of a language:
- Word frequency and usage (WF)
- Lexical density (LD)
- Personal pronoun usage (first, second, third person)
- Use of intensifiers (that was so cool!)
- Temporal references (I am going to be there)
In the second part, we devised a novel flexible factorization framework to understand the cognitive and affective aspects of language as it is used in various media by analyzing counts of the words used in each (LIWC). Affect and emotion were analyzed using words related to concepts such as positivity, negativity, anxiety, anger, and sadness, while cognitive aspects were measured by words related to insight, discrepancy, tentativity, certainty, etc.
The results of these various analyses were at once surprising and affirming — surprising because they overturned many a piece of conventional wisdom with respect to tweets; and affirming because they showcased the reasons for some of the behavior that the data exhibit.
- The language of Twitter is more conservative, less formal, and much less conversational than SMS and IM; however, it shares the brevity and interactivity of SMS and IM.
- Twitter users are developing unique styles that set its language apart from other mediums – for example, in the usage of temporal references. The use of temporal references on Twitter is much closer to SMS and IM — thus reaffirming its real-time nature.
- Twitter has much less variation of affect than traditional media like newspapers, magazines and emails; and it tends to display more positive moods and affect than these other media.
For more, see our full paper, Dude, srsly?: The Surprisingly Formal Nature of Twitter’s Language.