Call for Participation: GroupSight 2017

The Second Workshop on Human Computation for Image and Video Analysis (GroupSight) is to be held on October 24, 2017 at AAAI HCOMP 2017 at Québec City, Canada. This promises an exciting mix of people and papers at the intersection of HCI, crowdsourcing, and computer vision.

The aim of this workshop is to promote greater interaction between the diversity of researchers and practitioners who examine how to mix human and computer efforts to convert visual data into discoveries and innovations that benefit society at large. It will foster in-depth discussion of technical and application issues for how to engage humans with computers to optimize cost/quality trade-offs. It will also serve as an introduction to researchers and students curious about this important, emerging field at the intersection of crowdsourced human computation and image/video analysis.

Topics of Interest

Crowdsourcing image and video annotations (e.g., labeling methods, quality control, etc.)
Humans in the loop for visual tasks (e.g., recognition, segmentation, tracking, counting, etc.)
Richer modalities of communication between humans and visual information (e.g., language, 3D pose, attributes, etc.)
Semi-automated computer vision algorithms
Active visual learning
Studies of crowdsourced image/video analysis in the wild

Submission Details

Submissions are requested in the following two categories: Original Work (not published elsewhere) and Demo (describing new systems, architectures, interaction techniques, etc.). Papers should be submitted as 4-page extended abstracts (including references) using the provided author kit. Demos should also include a URL to a video (max 6 min). Multiple submissions are not allowed. Reviewing will be double-blind.
Previously published work from a recent conference or journal can be considered but the authors should submit an unrevised copy of their published work. Reviewing will be single-blind. Email submissions to

Important Dates

August 14August 23, 2017: Deadline for paper submission (5:59 pm EDT)
August 25, 2017: Notification of decision
October 24, 2017: Workshop (full-day)


CrowdCamp Report: Finding Word Similarity with a Human Touch

Semantic similarity or semantic relatedness are features of natural language that contribute to the challenge machines face when analyzing text. Although semantic relatedness is still a complex challenge only few ground truth data set exist. We argue that the available corpora used to evaluate the performance of natural language tools do not capture all elements of the phenomenon. We present a set of simple interventions that illustrate 1) framing effects influence similarity perception, 2) the distribution of similarity across multiple users is important and 3) semantic relatedness is asymmetric.

A number of metrics in the literature attempt to model and evaluate semantic similarity in natural languages. Semantic similarity has applications in areas such as semantic search, text mining, etc. The concept of semantic similarity has long been considered as a more specific concept than the concept of semantic relatedness. Semantic relatedness, as it includes the concepts of antonymy and meronymy, is more generic than semantic similarity.

Different approaches have been attempted to measure semantic relatedness and similarity. Some methods use structured taxonomies such as WordNet alternative approaches define relatedness between words using search engines (e.g., based on Google counts) or  Wikipedia. All of these methods are evaluated based on the correlation with human ratings. Yet only few benchmark data sets exist. One of the most widely used being the WS-353 data-set [1]. As the corpus is very small and the sample size per pair is low it is arguable if all relevant phenomena are in fact present in the provided data set.

In this study, we aim to understand how human raters perceive word-based semantic relatedness. We argue that asking simple word-based semantic similarity is beyond the scope of existing test sets. Our hypotheses in this paper are as follows:

(H1) The framing effect influences similarity rating by human assessors.
(H2) The distribution of similarity rating does not follow a normal distribution.
(H3) Semantic relatedness is not symmetric. The relatedness between words (e.g., tiger and cat) yields different similarity ratings in a different word order.

To verify our hypotheses, we collected similarity ratings on word pairs from the WS-353 data-set. We randomly selected 102 word pairs from the WS-353 data-set. We collected similarity ratings on the 102 word pairs through Amazon Mechanical Turk (MTurk). We collected 5 dataset for these 102 pairs. Each collection used a different task design and was separated into two batches of 51 words each. Each batch received ratings from 50 unique contributors so that each pair of word received 50 ratings in each condition.

The way the questions were asked to the crowd workers are shown in the following figure. For each question, 4 conditions were differently framed. The first two of these are “How is X similar to Y?” (sim) and “How is Y similar to X?” (inverted-sim).  We further repeated them asking for the difference between both words (dissim and inverted-dissim, respectively). Since the scale is reversed in dissim and inverted-dissim, the dissimilarity ratings were converted into similarity ratings for comparison.

The different ways of framing each question.
The different ways of framing each question.

We compared the distributions of similarity ratings in the original WS-353 dataset and our dataset in order to confirm the framing effect. The mean values of 50 ratings were calculated for each pair in our dataset to compare with original similarity ratings in the WS-353 dataset. We filtered exactly the same 102 word pairs from the WS-353 to ensure the consistency between two settings. The distributions are found to be significantly different (p < 0.001, paired t-test).

Our preliminary results show that similarity ratings for some word pairs in the WS-353 dataset do not follow a normal distribution. Some of the distributions reveal that there are different perceptions of similarity, which gets highlighted by multiple peaks. A possible explanation is that the lower peak can be attributed to individuals that are aware of the factual differences between a “sun” or “star” and an actual planet orbiting a “star” while the others are not aware of it.

We compared the difference between the similarity ratings of sim (dssim) and that of inverted-sim (inverted-dissim) to verify third hypothesis. Scatter plot representations of similarity ratings in different word orders for the similarity question and the dissimilarity question reflect that the semantic relatedness in different orders do not take same mean values, indicating the semantic relatedness is asymmetric. The asymmetric relationship consistently appears in the different types of questions (i.e., similarity and dissimilarity.) The results show a remarkable difference between the similarity of “baby” to “mother” and the similarity of “mother” to “baby”. This indicates that the asymmetric relationship between mother and baby was reflected in the subjective similarity rating.

To measure the inter-rater reliability, we have computed the value of Krippendorff’s alpha for both the original dataset and for the one we obtained through the current analysis. Krippendorff’s alpha is a statistical measure that basically provides a highlight of the agreement achieved when encoding a set of units of analysis in terms of the values of a variable.



[1] L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 2002.


For more, see our full paper, Possible Confounds in Word-based Semantic Similarity Test Data, published in CSCW 2017.

Malay Bhattacharyya
Department of Information Technology
Indian Institute of Engineering Science and Technology,

Yoshihiko Suhara
MIT Media Lab, Recruit Institute of Technology

Md Mustafizur Rahman
Information Retrieval & Crowdsourcing Lab
University of Texas at Austin

Markus Krause
ICSI, UC Berkeley