# Solving the Prisoners’ Dilemma

The prisoners’ dilemma is a game theoretic classic. Suppose you've got two suspects, interrogated in separate rooms, each deciding: rat out their partner, or no? If both rat, they both go to jail on a minor charge; if both stay mum, neither does; and if one rats and the other doesn't, the rat gets off free and the other goes to jail for a longer time (as punishment for staying quiet). No matter what the other player does, it’s better to rat: if the other rats, you're going to jail anyway and would prefer to avoid the punishment for not talking; if the other stays mum, best for you to rat him out and go free.

The prisoners’ dilemma is usually phrased as a case where *institutions* are necessary to enforce *cooperation*. The challenge is to think up appropriate institutions: structures of behavior that enforce both cooperation and their own self-propagation. Lots of such institutions are known: game theory is a very successful field. For example, if the two suspects are likely to be in many such situations, they are more likely to cooperate under threat of having their partner defect next time. Or, if two suspects won't work together again with the same partner, but share reputations, again cooperation can reign.

I'd like to discuss an (I think) new institution that enforces cooperation.

## Solving prisoners’ dilemma with shame

Formally, the prisoners’ dilemma is defined by the following payoff matrix:

D | C | |

D | 1, 1 | 6, 0 |

C | 0, 6 | 5, 5 |

You read this by choosing either Defect or Cooperate for the “row player” and the same choice for the “column player”, and then reading off the two outcomes in the pair—the first for the row player and the second for the column player. Higher numbers are better. The logic of the prisoners’ dilemma is that the D row has higher first values than the corresponding cells in the C row, so the row player always prefers to defect. Likewise for the column player—the D column has higher second values that the C column. Thus the only possible result is a play of D, D.

But where do these outcomes come from? I've heard it phrased as jail years of fine amounts, but there's no way to really know what someone's utility function is. My new solution to the prisoners’ dilemma rests on a new ability of agents that I hypothesize: the ability to change their utility function. For example, suppose the row player in the prisoners’ dilemma doesn't like always getting stuck in the defect/defect outcome, and eventually comes to see his own willingness to betray his partner as the cause. The row player could acquire a distaste for defecting when his partner cooperates, lowering the `6, 0`

outcome to, say, `4, 0`

. Perhaps the change can be ascribed to a new conscience, or shame, but it really need not have any emotional explanation. His value just changes.

The column player also acquires a distaste for betraying a cooperating partner. Then the new outcome matrix is this one,

D | C | |

D | 1, 1 | 4, 0 |

C | 0, 4 | 5, 5 |

which supports multiple equilibria: one where both players cooperate, one where both defect, and one where both players mix both with probability ½.

## Change of values mechanism

Above I described a just-so story for how changing values can fix the prisoners’ dilemma. To make it a proper part of game theory, I need to discuss strategic objectives for both players.

I assume that the prisoners’ dilemma game is preceded by a simultaneous value-choosing game. That is, the suspects in the prisoners’ dilemma first each choose their utility functions for the prisoners’ dilemma portion of the game; then they play a prisoners’ dilemma.

In the first phase, each player is trying to optimize their original utility function; in the second phase, they're trying to optimize their chosen utility function. Finally, for now I assume the change of utility function is *credible*: in the second phase, both players know what utility function each has. As usual, assume it's common knowledge that everyone is rational.

## Worked prisoners’ dilemma

Let's suppose in the first phase agents choose between just two utility functions: one where the C/D entries have 6 utility for the defector, and one where they have 4 utility. Each agent thus has four strategies and mixes of them: `4C`

, `4D`

, `6C`

, and `6D`

.

However, because we change utility function mid-stream, we can't just write this down in a table and compute equilibria. But we can reason by backwards induction.

- If the players choose, in the first phase, 6/6, then the outcome is 1,1.
- If the players choose, in the first phase, 4/4, then the outcome is 1,1 or 5,5 or 2.5,2.5.
- If the players choose, in the first phase, 4/6, then the outcome is 1,1.

Choosing 4/4 in the first phase is an equilibrium for the first phase, and `4C/4C`

is an equilibrium for the game as a whole. Thus the suspects can solve the prisoners’ dilemma and choose to cooperate by each **rationally choosing to have different values**.

## Open questions

Your choice of values can be a strategic consideration which can improve your ability to achieve your original values.

I'm intrigued by this mechanism for achieving cooperation; here's a few questions I wish I can answer about it:

- Can you change your values in a way that help you by hurting others?

I suspect not, since for the other player, not changing their values is always an option, and against those values you can't do better than keep your original values as well. This seems like a theorem for an afternoon.

- How credible must the change of values be?
- Must the change be credible? Can the players just internally both change their values to
`4`

, then for the second phase both say, “Hey, somehow I feel like we should both cooperate. Any objections?” This question is hard to investigate because dealing with unknown outcomes for your opponent in game theory requires either Bayesian reasoning or complex modal logic. - How widely does this mechanism work?
- Can we avoid all prisoners’ dilemma variants? Or only some? I don't know the literature on reducing games.
- Can this mechanism work in multiple stages?
- Is it ever useful to change your values in multiple steps? This probably depends on the answer to the credibility question.

If you've got any thoughts on the above, please let me know.