# Moral psychology on Amazon Mechanical Turk

There’s a lot of exciting work in moral psychology right now. I’ve been telling various poor fools who listen to me to read something from Jonathan Haidt or Joshua Greene, but of course there’s a sea of too many articles and books of varying quality and intended audience. But just last week Steven Pinker wrote a great NYT magazine article, “The Moral Instinct,” which summarizes current research and tries to spell out a few implications. I recommend it highly, if just for presenting so many awesome examples. (Yes, this blog has poked fun at Pinker before. But in any case, he is a brilliant expository writer. The Language Instinct is still one of my favorite popular science books.)

For a while now I’ve been thinking that recruiting subjects online could lend itself to collecting some really interesting behavioral science data. A few months ago I tried doing this with Amazon Mechanical Turk, a horribly misnamed web service that actually lets you create web-based tasks and pay online workers do them. Its canonical commercial applications include tedious tasks like search quality evaluation or image labeling, where you really need human data to perform well. You put up, say, several thousand images you want classified as “porn” or “not-porn”, say you’ll pay workers $0.01 to label ten images, then sit back and watch the data roll in. So AMT advertises itself as a data annotation or machine learning substitute system, but I think its main innovation is finding out that there are lots and lots of people with free time willing to do online work for very, very low amounts of money. You can run any task you want, including surveys, and people happily respond for mere pennies. (Far below minimum wage, I might add — their motivation seems to be more like casual gaming or so.) To that end, I tried out running one of the standard moral psych survey questions to see what would happen — the so-called “trolley problem”: A runaway trolley is hurtling down a track towards five people who have been tied down in its path. If nothing happens, they will be killed. Fortunately, you have a switch which would divert the trolley to a different track. Unfortunately, the other track has one person tied down to it. Should you flip the switch? It’s supposed to be a classic dilemma of consequentialist vs. deontological moral reasoning. Is it acceptable to sacrifice for the greater good? Is it permissible to take an action that will cause a preventable death? And so on. I think it’s neat just because when I pose it to people, different folks really do disagree, give different answers, and are willing to argue about it. There are some interesting recent fMRI findings (due to Greene I think?) that people who refuse to flip the switch seem to be engaged in a more emotional response, whereas those who do seem to be using deliberative reasoning systems. (Some, like Greene and Pinker, seem to go further and argue this is a substantive normative reason to favor flipping the switch; whether you feel like getting sucked into that debate, though, there’s clearly something interesting happening here.) So I ran this on AMT; the particpants (they call themselves “turkers”) had to answer yes or no. Turns out 77% say they’d flip the tracks. I also ran two variant scenarios of the same logical dilemma, to sacrifice one person to save five: A trolley is hurtling down a track towards five people. You are on a bridge under which it will pass, and you can stop it by dropping a heavy weight in front of it. As it happens, there is a very fat man next to you – your only way to stop the trolley is to push him over the bridge and onto the track, killing him to save five. Should you proceed? and A brilliant transplant surgeon has five patients, each in need of a different organ, each of whom will die without that organ. Unfortunately, there are no organs available to perform any of these five transplant operations. A healthy young traveler, just passing through the city the doctor works in, comes in for a routine checkup. In the course of doing the checkup, the doctor discovers that his organs are compatible with all five of his dying patients. Suppose further that if the young man were to disappear, no-one would suspect the doctor. Should the doctor sacrifice the man to save his other patients? These two, of course, feel a lot harder to say “Yes” to, but if you were willing to say “Yes” to the original question, it is hard to justify why. The participants’ repsonses followed what you would expect: fewer said “Yes” to these scenarios. Here are the Yes/No responses to each of the questions (100 responses for each): Question Yes No surgeon 2 98 fat man 30 70 switch, save 5 77 23 switch, save 10 82 18 switch, save 15 83 17 switch, save 20 83 17 Only two people thought it was acceptable to sacrifice for organs, and only half as many would push the fat man as would flip the switch. I also ran variants of the switch version with more and more people on the tracks; the Yes response creeps upwards but never reaches 100%. The differences among the first three questions are statistically significant (unpaired t-tests, all p<.001 (this seems like the wrong test, can anyone correct me?)). What’s amazing is how fast responses happen. I started getting responses just minutes after posting the question. I actually posted each of the six questions as a separate, standalone task; but many of the turkers who did one found the rest in the task pool and did them too. (So what was supposed to be a between-subjects design fell into something else, oops!) The whole thing cost$6 and was done in a matter of hours. It’s very encouraging — AMT allows you to very quickly iterate and try out different designs and such. It’s a bit of a pain to use, though; Amazon has certainly done a poor job in exploiting its full potential. (They have a form builder which was good enough to quickly write up these tasks, but to do anything moderately sophisticated, even just getting your data back out, you have to write programs against their somewhat mediocre API; you have to know how to use an XML parser, etc. Hm.)

I also tried an explicitly within-subject version, where each participant answered the three basic versions. I was interested in consistency — presumably very few people would sacrifice for organs but refuse to divert the trolley. For 141 participants, here are the frequencies of the different answer triples:

% with this response triple flip switch? push fat man? sacrifice traveler for organs?
42.6 Y N N
29.8 Y Y N
20.6 N N N
5.0 Y Y Y
0.7 Y N Y
0.7 N Y Y
0.7 N Y N

I personally find the most common responses coherent with my own gut reactions — from left to right, I feel less and less good about sacrificing in each case. Perhaps all people feel the same gut reactions, and use different ad hoc reasons to draw the line in different places?

I’m sorry that this post started with neat moral psychology then degenerated into methodology, but hey it’s fun. I’ve seen only two instances of any sort of research paper being written using AMT, both by computer scientists; here’s a nice blog post on an information retrieval experiment (it’s a great blog, btw); and someone mentioned to me this one on data processing accuracy also. Anyone know of any? It’s clearly an interesting approach.

This entry was posted in Best Posts. Bookmark the permalink.

### 21 Responses to Moral psychology on Amazon Mechanical Turk

1. J. Alden Page says:
2. Ed H. Chi says:

We used AMT to do psychology experiments around summer of 2007, and the results are published in a ACM CHI conference article here:

http://www-users.cs.umn.edu/~echi/papers/2008-CHI2008/2008-02-mech-turk-online-experiments-chi1049-kittur.pdf

Aniket Kittur, Ed H. Chi, Bongwon Suh. Crowdsourcing User Studies With Mechanical Turk. In Proceedings of the ACM Conference on Human-factors in Computing Systems (CHI2008). (to appear). ACM Press, 2008. Florence, Italy.

3. Brendan says:

Ed, thanks for the link to your paper! Everyone seems to be getting on the AMT wave :) We have a conference paper in review and if it’s accepted (or if it’s not, I suppose) I’ll post about it here…

Do you have the HITs or their templates saved anywhere? I’m curious to see the difference between the two different ones you ran, since you said you found big differences in the quality of Turker responses between them. I read through your CHI 2007 paper (“conflict and cooperation”) but couldn’t figure out exactly what the task was..

From reading your blog, looks like you folks have already discovered Panos Ipeirotis’s blog, and then perhaps ours (blog.doloreslabs.com). If you’ve heard or do any more cool AMT work I’d be eager to know.

4. Pingback: A moral dilemna : LikeItHateIt

5. Pingback: links for 2008-10-13 « Amy G. Dala

6. Ryan M says:

Interesting experiment – thanks for sharing data. My question deals with worker motivation. Did you require workers to explain their positions? I.e., was the rationale field optional? If not, what extrinsic motivation would any worker have to read the problem? Wasn’t the quickest path to cashout a random click (of either Position A or Position B)?

7. Ryan: Nope, no rationales at all. There was no extrinsic motivation at all — you’re right, they could have done a random click and been fully rewarded.

I find the fact that this survey worked at all to be encouragng support for the use of MTurk for social science research.

8. asfd says:

people who need organ transplants are likely going to be much sicker and have far shorter lives of lower quality than average people. moreover, in many cases people need organ transplants only due to risks they themselves assumed, such as alcoholics needing liver transplants.

Therefore, in the train example, the quality, length, and fault of each life saved or sacrificed are comparable. In the transplant example, however, they are not comparable: the healthy patient has no fault or other indications of sub-normal quality or length of life.

Therefore, the two examples are not comparable in that sacrificing the train victim does not implicate sacrificing the healthy patient.

9. appaliemo says:

Очень полезно

10. Anon says:

Unpaired t-tests sound fine to me. If you were doing a lot of them, you would want to adjust the critical threshold value. (With alpha at 5% and 20 t-tests, you would expect about one to test as significant when it is not.)

You could use ANOVA, which is designed for multiple t-tests and handles adjustments to critical values for you. It can also find interaction effects but you have no need for that here.

But for only three tests, I think t-tests are fine.

11. I don’t know what to take from the results of this particular experiment, but makes me wonder what other experiments might be possible using MTurk.
Norman

12. John says:

13. Artificial intelligence is the phenomenon of consciousness that has evoked the attention of many Philosophers and Scientists throughout history and vast number of papers and books have been published devoted to the subject. On the other hand, social science is made up of many different disciplines and factors which include geography, anthropology, psychology, political science, economics and sociology. Maurice F prout has very well explained about these entire phenomenons through his publications and articles, which are freely available at http://www.mauricefproutphd.com.
Although some of these factors have been researched, developed and practiced more thoroughly than others, psychology remains as the most argued as well as the most prominent factor. You can read more about psychology related articles at http://www.mauriceproutphd.com.

14. I guess that to receive the home loans from creditors you should have a good motivation. Nevertheless, one time I’ve received a small business loan, because I wanted to buy a building.

15. Juegos says:

It is nice to find this info in your post, i was looking the same but there was not any proper resource

16. Hi. I just analised yours page and I have to say that i wholly consort in such point of seeing things. Please try to write your posts more oft and I will for sure read it. Best Regards Zakłady bukmacherskie.

17. kupujmo says:

I was questioning if you ever thought about modifying the style and design of your web log? It’s really perfectly constructed; I appreciate exactly what you’ve got to declare. However maybe you may possibly perform a little more in the way of content thus folks might be connected with it much better. You’ve acquired an horrible whole lot of textual content for just having one or two visuals. Possibly you can space it out enhanced?

18. Michelle says:

Thank you for supplying this sort of invaluable content. I honestly enjoy your current specialist approach. I want to give thanks you for your efforts you have made for making so pro content material. http://onlinesurveysformoneylegit.com

19. Hoyt Gotcher says:

Thank you for the sensible critique. Me & my neighbor were just preparing to do some research on this. We got a book from our local library but I think I learned more from this post. I’m very glad to see such excellent info being shared freely out there..