# A simple model of community enforcement

Here is the model for my previous blog post on (anonymous) community enforcement. I would call it a simplified symmetric (single-population) version of the model in the paper by Michihiro Kandori entitled “Social norms and community enforcement” in the Review of Economic Studies 59.1 (1992): 63-80. The point of this blog post is to demonstrate that what I have claimed in the previous post can be made logically coherent. I can provide a reasonable and simple artificial world in which we obtain cooperative behavior under the fear of triggering a tipping point as a subgame perfect Nash equilibrium, meaning a self-enforcing situation that is even self-enforcing when the tipping point has been triggered and there is no way back.

There are $n$ people involved. These are those who are interested in going up the mountain, the people on the train, or the users of the communal kitchen. Time is discrete and runs from time points $0, 1, 2,$ until infinity. At every point in time $t$ one person is randomly drawn to undertake the activity (go up the mountain, use the bathroom, or the kitchen) so that each person has a $\frac{1}{n}$ probability of being drawn. The drawn person first observes (and only that) the state $x \in \{0,1,2,...\}$ of the resource (the amount of rubbish on the mountain, or the state of uncleanliness of the bathroom or kitchen). Then this person (after using the resource) decides whether to clean up ( $C$ – for cooperate or clean up) after themselves or not ( $D$ – for defect to use the language of the well-known prisoners’ dilemma game).

The instantaneous utility that the drawn person then receives shall be given by $\frac{1}{\lambda^x}$, where $x$ is the state of the resource – let us assume here that it is simply equal to the number of people who have been drawn before this person and who have chosen action $D$, that is not to clean up after themselves. This is the instantaneous utility this person receives when this person chooses $C$. When this person chooses $D$ they get the same utility, plus a small but positive term $d$ for not having to clean up. Let us assume that $\lambda > 1$, so this person receives less payoff the worse, that is the higher, the state of the resource $x$ is. As a function of $x$, the function $\frac{1}{\lambda^x}$ starts at $1$ for $x=0$ and then exponentially decays to the limit value of $0$ when $x$ tends to infinity.  When a person is not drawn to use the resource at some point in time this person receives an instantaneous utility of $0$. Every person discounts the future exponentially with a discount rate $\delta \in [0,1)$. This means that they evaluate streams of utils $u_t$ with the net present value $(1-\delta) \sum_{t=0}^{\infty} u_t \delta^t$, where the $(1-\delta)$ term is a convenient normalization.

For a well-defined game theoretic model, we need to identify players, their information, their strategies, and their payoffs. We have players and what they know and we have their payoffs. We have not quite yet defined their possible strategies, but we have specified their actions. To conclude the model we, thus, only have to define players’ possible strategies. These are all possible functions from the set of possible values of $x$, that is the set $\{0,1,2,...\}$, to the set of actions $\{C,D\}$. In principle, we should allow a bit more, as our players should probably remember what the state was at previous times when they were using the resource and also what they themselves did at these points in time, but this does not add or change anything of interest in our present analysis.

My claim then was that, at least for certain parameter ranges (for $\lambda$, $\delta$, $n$, and $d$), the following strategy is a subgame perfect Nash equilibrium: Play $C$ if $x=0$ and play $D$ otherwise. This kind of strategy is often referred to, in the repeated game literature, as a “grim trigger” strategy. In order to see this, we need to check two things. First, suppose everyone uses this strategy, which means that the play path has everyone cooperating (keeping the resource clean), is it best for a drawn player to also do so? Second, suppose the “trigger” has been released by someone playing $D$, that is by someone not cleaning up after themselves, is it best for a drawn person to then also play $D$ (to also not cleaning up after themselves)?

So, suppose first that everyone uses this strategy. Then a randomly drawn player at some point in time, that we can, without loss of generality, call time $0$, finds the following payoff consequences for their two possible choices and for all time periods from then on: $\begin{array}{ccccc} \mbox{time} & 0 & 1 & 2 & ... \\ C & 1 & \frac{1}{n} 1 & \frac{1}{n} 1 & ... \\ D & 1 + d & \frac{1}{n} \left(\frac{1}{\lambda} + d\right) & \frac{1}{n} \left(\frac{1}{\lambda^2} + d\right) & ... \end{array}$

The net present value for choosing $C$ at this point in time is then $(1-\delta) \left( 1 + \frac{1}{n} \sum_{t=1}^{\infty} \delta^t \right)$.
For choosing $D$ at this point in time it is $(1-\delta) \left( 1 + d + \frac{1}{n} \sum_{t=1}^{\infty} \left(\frac{1}{\lambda^t} + d\right) \delta^t \right)$.
The grim trigger strategy is a Nash equilibrium if and only if such a player would prefer $C$ over $D$,
that is if and only if $(1-\delta) \left( 1 + \frac{1}{n} \sum_{t=1}^{\infty} \delta^t \right) \ge (1-\delta) \left( 1 + d + \frac{1}{n} \sum_{t=1}^{\infty} \left(\frac{1}{\lambda^t} + d\right) \delta^t \right)$.
According to my calculations, this is equivalent to $\delta (\lambda-\delta) (1-d) \ge \left( d n (\lambda - \delta) \right) (1-\delta)$.

It is not straightforward to derive nice bounds for $\delta$ (as a function of the other parameters) so that this inequality is satisfied. But we can at least say that, if people are sufficiently patient, that is for $\delta$ close to $1$, the inequality is satisfied provided $d < 1$ as well, which I assumed anyway – I stated that I wanted $d$ positive and small.

For the second part of the argument, suppose that the “trigger” has been released and that everyone is playing $D$. Suppose the drawn person at some given point in time faces a state of uncleanliness of $x>0$. We can again reset the clock to zero without loss of generality. Then, the consequences of the two possible actions for this person are: $\begin{array}{ccccc} \mbox{time} & 0 & 1 & 2 & ... \\ C & \frac{1}{\lambda^x} & \frac{1}{n} \left(\frac{1}{\lambda^x} + d\right) & \frac{1}{n} \left(\frac{1}{\lambda^{x+1}} + d \right) & ... \\ D & \frac{1}{\lambda^x} + d & \frac{1}{n} \left(\frac{1}{\lambda^{x+1}} + d\right) & \frac{1}{n} \left(\frac{1}{\lambda^{x+2}} + d \right) & ... \end{array}$

The net present value for choosing $C$ at this point in time is then $(1-\delta) \left( \frac{1}{\lambda^x} + \frac{1}{n} \sum_{t=1}^{\infty} \left(\frac{1}{\lambda^{x+t-1}} + d\right) \delta^t \right)$.
For choosing $D$ at this point in time it is $(1-\delta) \left( \frac{1}{\lambda^x} + d + \frac{1}{n} \sum_{t=1}^{\infty} \left(\frac{1}{\lambda^{x+t}} + d\right) \delta^t \right)$.

It is in the best interest of this person to choose $D$ rather than $C$ if and only if the latter is greater than or equal to the former, and this is the case, according to my calculations, if and only if $\delta \le \frac{d n \lambda}{\lambda^{-x} (\lambda -1) + d n}.$
The right hand side is lowest for $x=1$ (among all integer values for $x > 0$), when it is $\delta \le \frac{d n \lambda}{\lambda^{-1} (\lambda -1) + d n}.$
Rearranging, one can see that the right hand side of this inequality is greater than or equal to one, meaning that there is in fact no restriction on $\delta$, if and only if $\lambda \ge \frac{1}{d n}.$
Recall that $\lambda > 1$. Then finally, this last inequality is satisfied if, for instance, $n$ is sufficiently large or $d$ is positive but very small.

All this together proves that, in the model given here, the strategy of cleaning up after yourself provided the resource is clean before you used it, and not cleaning up if the resource is not clean before you used it, is a subgame perfect Nash equilibrium: It is self-enforcing and the implicit threat of a tipping point in behavior is also self-enforcing. The model is very specific and many other versions would work just as well to make the same point.

## One comment

1. […] me sketch the model here (I will try and describe it in full detail in another post). Imagine you have a largish number of users of a place (such as the mountain top, the train […]

Like