CrowdE: Filtering Tweets for Direct Customer Engagements

Many consumer brands hire customer agents to engage customers on social media services such as Twitter; these agents solicit opinions, respond to questions and requests, and thank or apologize to customers when necessary.

The usual method of filtering relevant customer opinions using simple keywords is often insufficient, because coming up with the right keywords is not easy. For instance, a representative at Delta Airlines filtering for “delta” would also collect posts aout “alpha delta phi” and “Nile delta”. A more restrictive query, requiring both “delta” and “airline” would miss posts such as “I flew to Seattle on Delta.” Furthermore, even among posts that indeed refer to the brand, many are side comments with no brand-relevant opinion, and therefore are usually not worth an agent’s attention.

As a result, agents end up wasting tons of time reviewing irrelevant content.

What’s the solution?

Image of CrowdE Dashboard
The CrowdE Dashboard allows users to filter for brand-relevant tweets and mark them for follow-up actions.

We produced CrowdE, an intelligent filtering system that helps brand agents filter tweets. We designed a common reusable filter creation process, where we ask crowd workers to label tweets for a brand and then extract insights through machine learning. The resulting filtering system has a number of nice properties:

  • It can be customized for any particular consumer brand with minimal cost and design effort.
  • It supports filtering by relevance to the brand and by presence of brand-related opinion.
  • Filtering accuracy is on-par with expert-crafted filter rules for the given brand.

Using the CrowdE system, agents can filter the live Twitter stream at will, and mark relevant follow-up actions for each tweet. In user studies, both experienced and novice users preferred CrowdE to a traditional keyword-based filter. Users considered CrowdE-based filtering to be more efficientmore completeless difficult, and less tedious. CrowdE also gave users more confidence in their filtering. Users performed better, as well, correctly marking more follow-up actions in the same amount of time.

For more details, see our ICWSM 2013 paper, CrowdE: Filtering Tweets for Direct Customer Engagements.
Jilin Chen, IBM Almaden Research Center
Allen Cypher, IBM Almaden Research Center
Clemens Drews, IBM Almaden Research Center
Jeffrey Nichols, IBM Almaden Research Center

 

About the author

Jilin Chen

Jilin Chen is a research staff member at IBM Almaden Research Center. He got his PhD in computer science from University of Minnesota in 2011. His thesis research was on recommender systems in social media, where he built and evaluated systems that help people find useful information in social media by inferring interest, influence, and social relationships from "digital traces" that people left online. He also contributed to social media analytics in general - from predicting who will be more vocal in online political movements on Twitter, to analyzing the effect of group diversity on editor productivity in Wikipedia. At IBM he is working on a number of social media related projects, creating tools and models that help online users and communities.

View all posts

2 Comments

  • This paper addresses 3 problems. 1) You want to retrieve relevant tweets for a particular brand, which is an Information Retrieval problem. 2) You employ crowd workers to provide labels of the tweets’ relevance to brands. This transform the IR problem into a machine learning classification problem. 3) But you only want tweets that expresses an opinion. The 3rd problem makes it an interesting and challenging problem. But not much is mentioned about what features are used for the machine learning portion.

    I wonder how scalable is your approach since you require crowd workers to provide labels? I am thinking that it could be made scalable via some form of bootstrap or semi-supervised learning approach.

    Considering that we have so many kinds of brands in the world, is the approach generalizable for most kinds of consumer brands?

    Could we improve the performance by adding industry-specific considerations into the system?

    • Thanks for the comment!

      We did state the feature set in the paper – it is simply a bag of words for both relevance and opinion. The tricky part is knowing which words actually matter, and that’s why we used crowd worker input to find that out.

      Not sure what you mean by “scalable” – this isn’t a big data problem. I guess you mean the effort for creating the filters? Surely requiring crowd input for every brand is a burden; however, it is our experience that without this data we simply cannot create effective filters for even the two brands we considered.

      As a result, we tried best to make the filter creation process as painless as possible while maintaining the quality of the resulting filter. I guess that’s our main contribution.

      And of course, it is always possible to improve – by incorporating semi-supervised learning, by adding industry-specific knowledge. We’d love to see some follow-up work.