CrowdCamp Report: Gathering Causality Labels

Correlation does not imply causation. This phrase gets thrown around by scientists, statisticians, and laypeople all the time. It means that you shouldn’t use data about two things to infer that one thing causes the other, at least not without making a lot of limiting assumptions. But it is difficult to imagine ignoring causal inference when it seems to be such a key ingredient of intelligent decision-making. Machine learning approaches exist for using data to estimate causal structure, but we think it’s interesting that humans seem to judge causality without even looking at data. So, the goal of our CrowdCamp project was to gather some such judgements from real people.

To start, we make a list of variable names for which we hypothesize humans might have opinions about causal relationships without ever (or at least recently) having looked at the related data. Some of these variables include:

Real Daily Wages, Oil Prices, Internet Traffic, Residential Gas Usage, Power Consumption, Precipitation, Water Usage, Traffic Fatalities, Passenger Miles Flown in Aircraft, Auto Registration, Bus Ridership, Copper Prices, Wheat Harvest, Private Housing Units Started, Power Plant Expenditures, Price of Chicken, Sales of Shampoo, Beer Shipments, Percent of Men with Full Beards,
Pigs Slaughtered, Cases of Measles, Thickness of Ozone Layer, etc.

Using Amazon Mechanical Turk (AMT), we presented workers with sets of ten randomly chosen pairs of variables, and we asked them to choose the most fitting causal relationship between variable A and variable B between these four choices:

  • A causes B
  • B causes A
  • Other variable Z causes A and B
  • No causal relationship

Workers were advised “it’s possible that A and B may be related in several of the above ways. If you feel this is the case, choose the one that you believe is the strongest relationship.”

Example variable pair presented to crowd
Example variable pair presented to crowd

We collected 10 judgements from 50 workers, for a total of 500 judgements on pairs of 42 variables. When workers chose the option of a third variable causing both presented variables, we asked them to name the third variable (though we didn’t force them to). Of the 500 judgements, 74 of them were A->B, 85 were B->A, 34 were Z->A&B, and 307 were no causality. The most common one-directional causality judgements were:

1. Church Attendance -> Internet Traffic
2. Alcohol Demand -> Public Drunkenness
3. Federal Reserve Interest Rate -> Price of Chicken
4. Bus Ridership -> Oil Prices
5. Alcohol Demand -> Number of Forest Fires
6. Public Drunkenness -> Armed Robberies
7. Power Consumption -> Birth Rate
8. Church Attendance -> Armed Robberies
9. Bus Ridership -> Birth Rate
10. Price of Chicken -> Total Rainfall

Many of these are not surprising. Of course interest rates affect prices and alcohol consumption affects drunkenness. Others not so much… why would chicken prices affect rainfall? Also, we realize we only asked about the strength of the causal relationship, not the sign. So we have no way of knowing whether the workers believe going to church causes an increase or a decrease in armed robberies.

We also collected some interesting answers for the optional third variable Z causing both A and B. Most of the time it was some big general factor like population, economic conditions, geographical area, or fuel prices. There were some creative ones too:

A: Deaths from Homicides
B: Beer Shipments
Z: Thieves trying to intercept and steal beer shipments

So we collected all these judgements, now what do we do with them? As for machine learning applications, we see three options:

  1. Use as training/testing labels for causal inference techniques.
  2. See how well they serve for building informative priors to regularize regression problems.
  3. Use them to guide structure learning in probabilistic graphical models.

In conclusion, it was interesting to see how workers on AMT perceived causal relationships between economic, demographic, and miscellaneous variables by only looking at the names of the variables rather than actual data. We think it would be useful to take such qualitative “common-sense” preconceptions into account when designing automatic models of inference.

Alex Braylan, University of Texas at Austin
Kanika Kalra, Tata Research
Tyler McDonnell, University of Texas at Austin

About the author


View all posts