Enhancing Technical Q&A Forums With CiteHistory

CiteHistory is a browser plugin which helps technical forum participants track and share the research they do when asking and answering questions online.

When I program, I frequent Q&A sites like Stack Overflow or the MSDN Forums. I am not alone – past work has demonstrated that programmers make extensive use of these, and similar sites, when coding. If, like me, you frequent Stack Overflow or MSDN Forums, you might agree that the answers tend to be very technical and detailed.

Where on earth do these high-quality answers come from? After all, it seems unlikely that the authors have all these details memorized.

We set out to investigate this question, and what we found motivated us to build CiteHistory, a tool to help forum users share their online research – but I’m getting ahead of myself. First, let’s discuss how answerers research their solutions:

Similar to programmers writing code, we found that Stack Overflow and MSDN Forum participants make extensive use of online resources when answering programming questions.

We know this after surveying hundreds of MSDN Forum participants, and after performing log analysis of the sessions of 120 Stack Exchange users. In more detail:

  • About 50% of all answers involve online research
  • Research sessions last an average of 20 minutes, and consist of visiting an average of 4 relevant URLs

Critically, however, we found:

  • Fewer than 25% of all posts contain links or any other direct evidence of the extensive research conducted

This lack of information provenance is unfortunate because:

  • Such information can help readers assess an answer’s credibility
  • Survey respondents reported that links are often sufficient for answering questions (and posts with such links tend to be highly rated)

To ameliorate this situation, we developed a browser plugin called CiteHistory. CiteHistory automatically keeps track of the research that authors conduct while asking or answers questions on technical forums, and helps authors share this research material with the forum community. CiteHistory also associates research metadata with each post  (e.g., time spent researching, number of pages visited, etc.) to help highlight the true research effort involved in asking or answering forum questions.

For a demonstration of CiteHistory in action, please watch our introductory video:

CiteHistory Video









We evaluated CiteHistory in a two-week deployment study within a large organization. We found that CiteHistory succeeded in encouraging users to cite reference material, and was praised for its dual role as a personal research logbook.

You can also try out CiteHistory by visiting:  http://research.microsoft.com/en-us/um/redmond/projects/citehistory/

For more, see our full paper, Enhancing Technical Q&A Forums With CiteHistory.

Adam Fourney, University of Waterloo
Meredith Ringel Morris, Microsoft Research

About the author

Adam Fourney

Adam Fourney is a PhD student studying under the supervision of Dr. Michael Terry at the University of Waterloo in Ontario, Canada.

View all posts


  • The implicit design hypothesis behind CiteHistory is that authors _would_ include their reference materials if possible, but that they don’t have time or see the worth. I might offer another hypothesis, that these answerers want to appear as oracles who know these things off the top of their head.

    Think of doctors, for example. If the doctor comes back to you and says, “Well, I wasn’t entirely sure what was going on with my symptoms, so I consulted a book from med school, some academic literature, and then asked my friends…and you may have lupus.” For many people, that might not inspire much trust. So instead, the doctor just says, “I think you have lupus.”

    Would be interesting to see if that’s the case with any StackOverflow/MSDN users, and if so, how to design for them.

    • This is a fair criticism, and a real issue. We discuss this challenge in the full paper, and it did impact some of the design decisions we made (our pilot study encountered some major issues in this regard). That being said, I do want to highlight a few points:

      We learned from our initial questionnaire study that readers and question authors really do appreciate finding high-quality links in answers. We also found that answers containing links tend to score better on Stack Overflow. Granted, this latter point illustrates a correlation, not a causal relationship. However, given the highly competitive nature of these Q&A sites, we figured it shouldn’t be too hard to convince authors to cite helpful links – to the benefit of the readers and questioners.

      CiteHistory also streamlines the workflow of authors already in the habit of citing relevant material in their posts. Again, we know that these authors produce answers that score well, so we want to improve their workflow if possible.

      We also found that CiteHistory users really valued the tool as a personal research tracker. CiteHistory allowed users to create extensive bibliographies without making everything public. Had we anticipated this use of our tool, we might have developed the plugin slightly differently.

      Finally, in the paper we argue that our work makes two important contributions: The first contribution being the questionnaire & log studies demonstrating that answerers are not oracles (i.e., they do extensive research). The second contribution is the CiteHistory system itself. In some sense, the first contribution is the more “researchy” and the work may have been better served as two distinct papers so as to not dilute the first.

  • Good thoughts! I don’t think that you need to back off on your design rationale, or on the claim of the design being a contribution. There’s a real tendency in HCI to focus on “science” contributions, which are easily defensible, in comparison to design or engineering contributions.

    Interestingly, SO almost encourages the “oracle” mindset by allowing users to clean up each others’ language so that the final item looks pristine.

    • Thanks for the awesome feedback.

      Regarding the wiki-like editing of Stack Overflow posts, I think this is a very interesting angle. We only looked at the initial ask / answer posting actions. We did not look at edits by other contributors. However, it would be really interesting to track what kind of links and citations survive multiple edits, and which ones might actually be added. I suspect that this could be investigated by independent researchers — Stack Exchange publicly posts database dumps and interactive database clones of their top properties: http://data.stackexchange.com/
      Its really fantastic.