One of the main theoretical tools in modeling economics, political science and some fields of biology are game theoretical tools. You could fill many courses with game theoretical models, and we will not try to give a full introduction here.
Instead, we want to show how we can use matrices to define a game and then give some simple examples that will build some intuition for how to set up a simple game theoretic model and solve a model numerically.
A "Normal-form" game has three components:
Matching pennies is a simultaneous game (players move at the same time, without knowledge of what the other does). They each have a coin which they place on the table either heads or tails.
This is a game in the sense of the pay-off for one player is dependent on the action of the other.
But clearly it is not a strategic game. There is no need to think deeply about your strategy in response to the other player's strategy.
We start by defining this game as a matrix of pay-offs
matching_pennies_bimatrix = [[(1, -1), (-1, 1)],
[(-1, 1), (1, -1)]]
This is a matrix of tuples (pairs) representing pay-offs from actions of the two players:
The pay-offs (as shown in the matrix) are that if:
We can convert this bi-matrix format to a normal form game using a package from Quantecon (which you may have to install first)
import numpy as np
import quantecon.game_theory as gt
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions. Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
g_MP = gt.NormalFormGame(matching_pennies_bimatrix)
We could then see each player with their individual pay-off matrix
g_MP.players
(Player([[ 1, -1], [-1, 1]]), Player([[-1, 1], [ 1, -1]]))
If we wanted to get player 1's pay-off array we could, for example write
g_MP.players[0].payoff_array
array([[ 1, -1], [-1, 1]])
In game theory, what we are often looking for is a strategy that is optimal for a player given that the other player is assumed to play their optimal solution: This is called a Nash Equilibrium.
We can solve our nash equilibrium by first getting the best response of each player to the actions of the other player
We can get the best response of player 0 to action 1 of player 1 by writing:
g_MP.players[0].best_response(1)
1
So if player 1 plays 1, then the best response of player 0 is 1 (You can confirm this by considering the matrix above).
Likewise the best response of player 1 if player 0 plays 1 is 0.
g_MP.players[1].best_response(1)
0
But of course, we don't get to observe the opponents choice before we move. So instead, let's assume that the opponent is randomly choosing an action - with a 50\% chance of choosing either action 0 or 1:
g_MP.players[0].best_response([0.5, 0.5], tie_breaking=False)
array([0, 1])
Here we get back an array - then both actions 0 or 1 are equally good responses.
If we want to do simulations, randomly choosing one of the best responses, we could change the tie_breaking parameter (run the below several times)
for i in range(10):
print(g_MP.players[0].best_response([0.5, 0.5], tie_breaking='random'))
1 1 1 0 1 0 0 1 1 1
Let us consider a simple game with some strategy: the well known prisoner's dilemma game
Let's write out our game in binary form:
Player 2 | Silent | Snitch | |
---|---|---|---|
Player 1 | Silent | -1,-1 | -3,0 |
Snitch | 0,-3 | -2,-2 |
The story behind this game is that you have two criminal accomplices who are arrested for a crime, then put into separate interogation rooms where they can not communicate with each other.
Both criminals are offered a deal to confess and in turn snitch on their accomplice, in return they can go free, but their accomplice get's sentenced to 3 years.
If they both stay silent, they get charged with the lesser crime and serve 1 year in jail each.
If both snitch, then they will serve 2 years in jail each.
What is the Nash equilibria?
Analysing the game, and seeing what the best response is for each player given what the other player does, we find that the Nash Equilibria in this game is actually (snitch, snitch)! Thus the rational choice here for each player ends up with a result that is suboptimal.
In other words, if the criminals could coordinate and they could convince each other that they would do what they say they will do, the optimal solution would be for both to stay silent. But when this coordination breaks down, we end up with a supoptimal (but rational) solution where both end up snitching.
Here is how you manually go about finding this solution:
Player 2 | Silent | Snitch | |
---|---|---|---|
Player 1 | Silent | -1,-1 | -3,0 |
Snitch | 0,-3 | -2,-2 |
Above, the best-response pay-off for player 1 are in bold. If player 2 plays silent, then player 1 should snitch. And if player 2 plays snitch, then player 1 should also snitch:
Now we change perspectives to player 2:
Player 2 | Silent | Snitch | |
---|---|---|---|
Player 1 | Silent | -1,-1 | -3,0 |
Snitch | 0,-3 | -2,-2 |
Above, we can see the best responses of player 2. If player 1 is silent, then player 2's best response is to snitch, and if player 1 plays snitch, then player 2's best response is to play snitch.
So, which of the squares ends up being a solution where each of the player's actions is a best response to the other player's response? (Snitch, Snitch). This is the Nash Equilibrium.
The Prisoners dilemma game can be used in a lot of situations where you have coordination problems. A few notable applications include
Climate negotiations: When many countries must agree on cuts in order to reduce costs of a public bad (climate change), individual countries have an incentive to cheat, which can lead to a sub-optimal equilibrium.
Modelling cartell agreements: For example, in the OPEC oil cartell, the members agree on certain production quotas in order to avoid prices dropping, but individual countries have an incentive to cheat and produce more than their quota (which they have known to do regularly). This can lead to an equilibrium that is sub-optimal for all countries.
Now let's see how to solve the problem with the help of the Quantecon solver. This time we will enter the game a bit differently:
g_PD = gt.NormalFormGame((2, 2)) # There are 2 players, each of whom has 2 actions
g_PD[0, 0] = -1, -1
g_PD[0, 1] = -3, 0
g_PD[1, 0] = 0, -3
g_PD[1, 1] = -2, -2
We can print out the game as a matrix:
print(g_PD)
2-player NormalFormGame with payoff profile array: [[[-1., -1.], [-3., 0.]], [[ 0., -3.], [-2., -2.]]]
We can get the best responses
print("player-0", g_PD.players[0].best_response(0),g_PD.players[0].best_response(1))
print("player-1", g_PD.players[1].best_response(0),g_PD.players[1].best_response(1))
player-0 1 1 player-1 1 1
We can find the Nash Equilibria using the computer by using the brute force algorithm in quantEcon, meaning it goes through all possible solutions and checks each one to see if it is a nash equilibrium
NEs = gt.pure_nash_brute(g_PD)
print(NEs)
[(1, 1)]
Can you formulate a scenario that you could model as a 2-player prisoner dilemma game, but where there are 3+ actions that each player can take, thus leading to a 6+ pay-off matrix?
In introductory courses in economics, we are often introduced to models of market places that are either perfect competition or one-producer monopolies. Reality often lies between.
Many markets can be defined as having imperfect competition, where a handful of firms compete against each other, but not to the extent that they compete away all the profits. Instead, the producers strategically choose either their price or quantity of production based on the production of their competitors.
If their strategic choice is based on quantity, we call this form of competition cournot competition
Let's say that we have the following cournot game:
We can solve this this analytically by setting up a sequential maximization problem. First consider the maximization from producer 1's perspective
$$\max_{q_1} \pi^{1} = 100q_1-q_1 -q_1^2 - q_2q_1$$$$\frac{\partial \pi^1}{\partial q_1} = 100-1-2q_1-q_2 = 0$$$$q_2 = 100-1-2q_1$$Then maximize from producer two's perspective
$$\frac{\partial \pi^2}{\partial q_2} = 100-1-2q_2-q_1 = 0$$$$q_1 = 100-1-2q_2$$When both of these first order conditions hold, that means both players are maximizing their profits subject to the other player maximizing their profit (nash equilibrium). Thus finding the solution of these two equations with two unkowns will also be our nash equilibrium.
Substituting in one equation in the other:
$$q_2 = 99 - 2(99-2*q_2)$$$$q_2^* = 33$$And plugging this back into either one of the FOC's, we get:
$$q_1^* = 33$$Now consider an equivalent game, but this time with three producers. Formulate the profit function and the first order conditions. Is it possible to formulate the solution to these equations as a matrix algebra problem?
To solve this purely numerically, we have to discretize the quantities.
Here we will define our potential quantities as being between 0 and 100 in 1 increments
a, c = 100, 1
q_grid = np.arange(0,100,1)
q_grid
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
Now we need to create a pay-off array based on the information above. In order to do such a calculation we will do what is called a cartesian product of arrays, where you create an array of every combination of two or more arrays.
Below, we create a matrix of every possible combination of production from producer 1 and producer 2
from quantecon import cartesian
prod_combis = cartesian([q_grid, q_grid])
prod_combis
array([[ 0, 0], [ 0, 1], [ 0, 2], ..., [99, 97], [99, 98], [99, 99]])
Then we sum across the columns to get all possible total production:
tot_prods = prod_combis.sum(axis=1)
tot_prods
array([ 0, 1, 2, ..., 196, 197, 198])
We then reshape into a 100x100 matrix
tot_prods = tot_prods.reshape([100,100])
tot_prods
array([[ 0, 1, 2, ..., 97, 98, 99], [ 1, 2, 3, ..., 98, 99, 100], [ 2, 3, 4, ..., 99, 100, 101], ..., [ 97, 98, 99, ..., 194, 195, 196], [ 98, 99, 100, ..., 195, 196, 197], [ 99, 100, 101, ..., 196, 197, 198]])
We can now read the matrix as, for example, position (row 1, column 2) representing that when player 1 produces at position 1 (production of 1) and player 2 produces at position 2 (production of 2), we get a total production of 3:
tot_prods[1,2]
3
We can then take this matrix of production possibilities and create pay_offs
tot_prods
array([[ 0, 1, 2, ..., 97, 98, 99], [ 1, 2, 3, ..., 98, 99, 100], [ 2, 3, 4, ..., 99, 100, 101], ..., [ 97, 98, 99, ..., 194, 195, 196], [ 98, 99, 100, ..., 195, 196, 197], [ 99, 100, 101, ..., 196, 197, 198]])
recalling our profit function for producers 1 and 2:
$$\pi_1(Q) = (a-c-q_1 - q_2)q_1$$$$\pi_2(Q) = (a-c-q_1 - q_2)q_2$$#first the (a-c-q_1 - q_2)
payoff_array_p1 = a-c-tot_prods
Then multiply by the original array of potential quantities ($q_1$ or $q_2$)
payoff_array = payoff_array_p1*q_grid.reshape([100,1])
We then have a payoff_array which gives the profit for a player given the actions of the two players (player 1, rows; player 2 columns).
payoff_array
array([[ 0, 0, 0, ..., 0, 0, 0], [ 98, 97, 96, ..., 1, 0, -1], [ 194, 192, 190, ..., 0, -2, -4], ..., [ 194, 97, 0, ..., -9215, -9312, -9409], [ 98, 0, -98, ..., -9408, -9506, -9604], [ 0, -99, -198, ..., -9603, -9702, -9801]])
We then define each player by their payoff_array. Since it is a symmetric game (same marginal costs, same profit function), then both have the same payoff array
player1 = gt.Player(payoff_array)
player2 = gt.Player(payoff_array)
We then create our normal form game by entering the player payoff matrices
cournGame = gt.NormalFormGame([player1, player2])
We can use the the brute force algorithm to go through all possible combinations and find those that are nash equilibrium
courn_NEs = gt.pure_nash_brute(cournGame)
print(courn_NEs)
[(32, 34), (33, 33), (34, 32)]
What we get in return are tuples representing positions on the payoff array. Here the positions correspond to the discretisation we did, but that will not always be the case.
In general to get the quantity produced we can use our original discretisation array:
q_grid[33]
33
We know from solving this problem analytically, that each producer producing 33 is indeed the nash equilibrium. But here we also get two more equilibriums: (32, 34), (34, 32), where did these come from?
This shows one of the problems with discretisation. If the producer really was limited to producing in 1 unit increments, then these would represent nash equilibria. You can get an idea of why this happens by checking the payoffs of the tuples (33,34) and (34,33).
But if this is an artificial constraint that just comes from the way we discretized, then this can be misleading. Like we said, our analytic solution (allowing real number for production) shows only one nash equilibria (33, 33).
A good understanding of our model helps us to eliminate artificial nash equilibria. This was a symmetric game: all the profit functions for the players look identical. Thus nash equilibria should also have quantities for all the players that are identical. We can then get rid of (32,34) and (34,32) as solutions
We are not limited to modelling two producers. Oyama wrote the below function to create a cournot game with any number of producers, N. the code below is very compact, but is essentially the same as the steps we went through above, though generalized to N players.
def cournot(a, c, N, q_grid):
"""
Create a `NormalFormGame` instance for the symmetric N-player Cournot game
with linear inverse demand a - Q and constant marginal cost c.
Parameters
----------
a : scalar
Intercept of the demand curve
c : scalar
Common constant marginal cost
N : scalar(int)
Number of firms
q_grid : array_like(scalar)
Array containing the set of possible quantities
Returns
-------
NormalFormGame
NormalFormGame instance representing the Cournot game
"""
q_grid = np.asarray(q_grid)
payoff_array = \
cartesian([q_grid]*N).sum(axis=-1).reshape([len(q_grid)]*N) * (-1) + \
(a - c)
payoff_array *= q_grid.reshape([len(q_grid)] + [1]*(N-1))
payoff_array += 0 # To get rid of the minus sign of -0
player = gt.Player(payoff_array)
return gt.NormalFormGame([player for i in range(N)])
We can give an example with 3 players, while otherwise maintaining the same parameters as above.
a, c = 100, 1
q_grid = np.arange(0,100,1)
N=3 #number of players
g_Cou = cournot(a, c, N, q_grid)
We can now print out the payoff array for player 0: this will now be in the form of a 3-dimensional array
print(g_Cou.players[0])
Player in a 3-player normal form game with payoff array: [[[ 0, 0, 0, ..., 0, 0, 0], [ 0, 0, 0, ..., 0, 0, 0], [ 0, 0, 0, ..., 0, 0, 0], ..., [ 0, 0, 0, ..., 0, 0, 0], [ 0, 0, 0, ..., 0, 0, 0], [ 0, 0, 0, ..., 0, 0, 0]], [[ 98, 97, 96, ..., 1, 0, -1], [ 97, 96, 95, ..., 0, -1, -2], [ 96, 95, 94, ..., -1, -2, -3], ..., [ 1, 0, -1, ..., -96, -97, -98], [ 0, -1, -2, ..., -97, -98, -99], [ -1, -2, -3, ..., -98, -99, -100]], [[ 194, 192, 190, ..., 0, -2, -4], [ 192, 190, 188, ..., -2, -4, -6], [ 190, 188, 186, ..., -4, -6, -8], ..., [ 0, -2, -4, ..., -194, -196, -198], [ -2, -4, -6, ..., -196, -198, -200], [ -4, -6, -8, ..., -198, -200, -202]], ..., [[ 194, 97, 0, ..., -9215, -9312, -9409], [ 97, 0, -97, ..., -9312, -9409, -9506], [ 0, -97, -194, ..., -9409, -9506, -9603], ..., [ -9215, -9312, -9409, ..., -18624, -18721, -18818], [ -9312, -9409, -9506, ..., -18721, -18818, -18915], [ -9409, -9506, -9603, ..., -18818, -18915, -19012]], [[ 98, 0, -98, ..., -9408, -9506, -9604], [ 0, -98, -196, ..., -9506, -9604, -9702], [ -98, -196, -294, ..., -9604, -9702, -9800], ..., [ -9408, -9506, -9604, ..., -18914, -19012, -19110], [ -9506, -9604, -9702, ..., -19012, -19110, -19208], [ -9604, -9702, -9800, ..., -19110, -19208, -19306]], [[ 0, -99, -198, ..., -9603, -9702, -9801], [ -99, -198, -297, ..., -9702, -9801, -9900], [ -198, -297, -396, ..., -9801, -9900, -9999], ..., [ -9603, -9702, -9801, ..., -19206, -19305, -19404], [ -9702, -9801, -9900, ..., -19305, -19404, -19503], [ -9801, -9900, -9999, ..., -19404, -19503, -19602]]]
Again, we can use the brute force nash equilibrium
courn_NEs = gt.pure_nash_brute(g_Cou)
print(courn_NEs)
q_grid[66]
[(24, 24, 26), (24, 25, 25), (24, 26, 24), (25, 24, 25), (25, 25, 24), (25, 25, 25), (26, 24, 24)]
66
You probably experienced that it took some time to find the solution when we had 3 players. The solution algorithm has to look through all the possible solutions.
In the notebook by Oyama, he writes a function for a sequential games and shows how to solve a sequential cournot game. Take the case of 2 players and the parameters we used above (grid with 100 values, a=100, c=1) and work, step-by-step to find a solution sequentially. Try to explain each step.
For our cournot game, consider the situation with two producers, but where the marginal costs are different. Player 1 has marginal costs of 1 and player 2 has marginal costs of 1.1. Can you compute a nash equilibria in this case (assuming all other parameters are the same? Can you experiment with using a finer grid?
Oyama, Daisuke "Tools for Game Theory in QuantEcon.py", https://nbviewer.org/github/QuantEcon/QuantEcon.notebooks/blob/master/game_theory_py.ipynb
Wikipedia, "Prisoner's dilemma", https://en.wikipedia.org/wiki/Prisoner%27s_dilemma
-