The 'Prisoner's Dilemma' Tests Women In And Out Of Jail

By Tania Lombrozo

Published July 29, 2013 at 10:02 AM EDT

I just learned something interesting about women in prison, and it wasn't by watching Orange is the New Black.

For the first time, researchers have investigated how actual prisoners — in this case female prisoners — respond to the "prisoner's dilemma," a famous conundrum used to model and study cooperation and it limits.

In the dilemma, two prisoners must decide whether to rat each other out or keep mum. Although each is better off snitching than keeping quiet, they'll both serve less jail time if they jointly keep quiet than if they both snitch — hence the dilemma. For a 3-minute introduction or refresher, check out this Scientific American Instant Egghead video:

In a paper just published in the Journal of Economic Behavior and Organization, Menusch Khadjavi and Andreas Lange take the prisoner's dilemma to jail, comparing the behavior of female inmates with that of female university students.

Do prisoners show less cooperation than college students? Or do prisoners cooperate more often, perhaps the result of operating under different social norms, with reciprocity expected or snitching heavily punished?

The researchers considered two versions of the dilemma. In the classic simultaneous version, each person has to decide whether to defect (= rat out her partner) or cooperate (= stay mum) without knowing what her partner has chosen to do. In an alternative sequential version, one partner decides first (Round 1), without knowing how the other will decide, but the second makes a decision after knowing how her partner has decided (Round 2), more like "Dilbert's Dilemma," below:

Here's how the Los Angeles Times summarized the differences in cooperation found between the female prisoners and the female students:

In the simultaneous game, a greater share of the prisoners solved their dilemma through cooperation: 55% chose A [to cooperate with each other], compared with 37% of students.

So it sounds like the prisoners cooperated more often than the students, at rates that would translate into collective cooperation — where both partners in a pair cooperate — in about 30 percent of cases for the prisoners, but only 13 percent for the students.

Here's how the LA times reported the findings from the sequential version:

Prisoners tended to play Round 1 roughly the same way they had played the simultaneous game ... . But the proportion of students who picked A [cooperation] leaped to 63% in the first round of the sequential game, from just 37% in the simultaneous game.

So it appears that while the prisoners were more cooperative than the students in simultaneous dilemmas, the students were more cooperative in sequential versions. (Interestingly, both prisoners and students acted the same in "Round 2" of sequential dilemmas, with about 60 percent cooperating if their partner cooperated first, and all but one participant defecting if their partner defected first.)

These findings are pretty intriguing, and prompt assorted speculation. Unfortunately, though, a close look at the original paper reveals that the differences between prisoners and students are pretty tenuous.

Within psychology, results are typically considered statistically significant when they're unlikely to have been generated by chance, a topic I've written about before and which Andrew Gelman succinctly explains in a recent article at Slate.com:

The standard in research practice is to report a result as "statistically significant" if its p-value is less than 0.05; that is, if there is less than a 1-in-20 chance that the observed pattern in the data would have occurred if there were really nothing going on in the population.

In the current paper, the key differences between the prisoners and the students did not meet this standard – the results were just barely "significant" at a threshold of .10, not .05. Even if there's no difference in cooperation rates between prisoners and college students, you'd still expect to find a difference between the two groups that's significant at the .10 level in about 1 in 10 cases.

In fact, looking more closely at the data, there was no difference in the rate at which prisoners and students cooperated when averaged across responses to the simultaneous dilemmas and Round 1 of the sequential dilemmas — where the only difference is whether the person anticipates that her partner will know her choice before choosing herself.

The authors of the original research don't hide the fact that their primary group differences don't meet the .05 criterion for significance, though they do go on to draw conclusions from the marginal results. In some ways this is reasonable — their sample size wasn't huge (90 prisoners, 92 students), which limited their statistical power; the prison population isn't easy to access for this kind of research, so they couldn't have readily boosted their sample size. Weak evidence is still better than no evidence.

Reports in the popular press, however, have not been very cautious. You wouldn't know from the LA Times report, the Business Insider story, or the Smithsonian.com blog post, for example, that the reported group differences are so shaky. One (otherwise lovely) report states that the prisoners "betray one another far less than college students do" (emphasis added), when in fact — by standard conventions in psychology — all we should conclude is that we lack strong evidence for a difference between the two groups.

My intention here isn't to call out science journalism, which sometimes receives a bad rap and is often truly excellent. Instead, I want to highlight a theme that's been discussed on this blog before: how much of the scientific process, and the uncertainty inherent to science, can and should be communicated along with scientific findings.

Had the findings in this new research been significant at the .05 level (or the .001 level ... ), we'd still face some uncertainty about whether or not the results reflected real differences in the two groups that they were meant to describe — prisoners versus non-prisoners. That's just how science works.

But weak evidence isn't the same as strong evidence. We don't have a good vocabulary, or established norms in talking about science in an accessible way, for communicating varying shades of uncertainty and their myriad sources.

Perhaps individual scientists and journalists face a "publisher's dilemma": scientific findings are more likely to be published, receive press, and garner page views the stronger and flashier they appear. Yet, as a community, we're all better off recognizing, tolerating and communicating gradations in the nature of evidence.

Can we find a new way to cooperate?

You can keep up with more of what Tania Lombrozo is thinking on Twitter: @TaniaLombrozo