This lab will closely follow the QuantEcon lecture that uses difference equations and matrix algebra in a dynamic economic model. In this way, this lab brings together several of the methods we have learned throughout the semester.
The model we will be simulating is based on a model by the economist Paul Samuelson where GDP is modelled dynamically by what we could call an autoregressive process:
$$y_t = \alpha_0 + \alpha_1 y_{t-1} + \alpha_2 y_{t-2}$$We could also say that this is a second order difference equation. But as the lecture points out, we could also model this as a set of simultaneous equations, one equation for each period t:
$$\mathbf{Ay=b}$$where
$$\mathbf{y} = \begin{bmatrix}y_1 \\y_2, \\y_3\\\dots\\ y_T \end{bmatrix} $$To make this a bit more clear, let us start writing out some of the time periods. We have two initial values that are just set to constants:
$$y_{-1} = IV_1$$$$y_{0} = IV_0$$Then we can calculate out what the value should be in period 1:
$$ y_1 = \alpha_0 + \alpha_1 y_0 + \alpha_2 y_{-1}$$And now in period 2:
$$ y_2 = \alpha_0 + \alpha_1 y_1 + \alpha_2 y_{0}$$Which we rewrite by collecting all the terms where t>0 on the left hand side
$$ -\alpha_1 y_1 + y_2 = \alpha_0 + \alpha_2 y_{0}$$In period 3 we get:
$$ y_3 = \alpha_0 + \alpha_1 y_2 + \alpha_2 y_{1}$$Which we rewrite as:
$$-\alpha_2 y_{1} + -\alpha_1 y_2 + y_3 = \alpha_0 $$period 4 we get: $$ y_4 = \alpha_0 + \alpha_1 y_3 + \alpha_2 y_{2}$$
Which becomes:
$$-\alpha_2 y_{2} + -\alpha_1 y_3 + y_4 = \alpha_0 $$And we continue. We then take these series of T equations, and everything on the right-hand side become the rows in the b-vector. On the left-hand side we can again split the terms into a vector of y's and a coefficient matrix, A. You can see QuantEcon lecture for the full matrix representation:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm
plt.rcParams["figure.figsize"] = (11, 5)
We can generate an example
T = 80
# parameters
a0 = 10.0
a1 = 1.53
a2 = -.9
#initial values
y_1 = 28. # y_{-1}
y0 = 24.
Below we create our A matrix, starting with an identity matrix of size T.
In the below loop of time periods we first ask if the previous time period (t-1) is greater than or equal to zero.
If it is, then we set the value of the matrix A at the position (t,t-1) and the previous time period to be equal to -a1.
We then ask if the time period two periods ago (t-2) is greater than or equal to zero. If it is, then we set the value of the matrix A at position (t,t-2)
A = np.identity(T) # The T x T identity matrix
for t in range(T):
if t-1 >= 0:
A[t, t-1] = -a1
if t-2 >= 0:
A[t, t-2] = -a2
b = np.full(T, a0) #fills an array of size T with zeros
#sets initial two values of b equal to the special values
b[0] = a0 + a1 * y0 + a2 * y_1
b[1] = a0 + a2 * y0
We can look at the A and b matrices and compare them with the presentation in the lecture
A
array([[ 1. , 0. , 0. , ..., 0. , 0. , 0. ], [-1.53, 1. , 0. , ..., 0. , 0. , 0. ], [ 0.9 , -1.53, 1. , ..., 0. , 0. , 0. ], ..., [ 0. , 0. , 0. , ..., 1. , 0. , 0. ], [ 0. , 0. , 0. , ..., -1.53, 1. , 0. ], [ 0. , 0. , 0. , ..., 0.9 , -1.53, 1. ]])
b
array([ 21.52, -11.6 , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ])
We are now ready to try to find our time path for our GDP, given our initial conditions. That is, we want to find a solution for our vector $\mathbf{y}$.
Recall our system of equations can be written
$$Ay=b$$Thus a solution to our equation would be
$$y=A^{-1}b$$A_inv = np.linalg.inv(A)
y = A_inv @ b
y
array([21.52 , 21.3256 , 23.260168 , 26.39501704, 29.45022487, 31.30332872, 31.38889055, 29.8520067 , 27.42356875, 25.09125416, 23.70840699, 23.69173395, 24.91078665, 26.79094302, 28.57043484, 29.60091658, 29.57601102, 28.61047193, 27.15561214, 25.79866184, 25.03190168, 25.08001392, 25.84370979, 26.96886344, 28.00302226, 28.57264696, 28.51342981, 27.91016535, 27.04046616, 26.2527644 , 25.83031 , 25.89288633, 26.36883709, 27.04072305, 27.64035289, 27.95308917, 27.89190883, 27.51684026, 26.99804765, 26.54185667, 26.31079782, 26.36784966, 26.66309194, 27.06346598, 27.4103202 , 27.58067052, 27.52913772, 27.29697724, 26.98815123, 26.72459187, 26.59928945, 26.64478017, 26.82715316, 27.06524218, 27.26538269, 27.35731756, 27.31785144, 27.1747269 , 26.99126586, 26.83938256, 26.77211604, 26.80589324, 26.91811222, 27.05940778, 27.17459291, 27.22366015, 27.19506641, 27.10715747, 26.99839116, 26.91109676, 26.87542599, 26.89941469, 26.96822108, 27.05190503, 27.11801573, 27.14384954, 27.12387563, 27.07006514, 27.00571159, 26.95568011])
Thus one way of solving this is to find the inverse of A. But it is generally not good practice to try to calculate inverses, especially as A becomes large. Rather we use the built-in solver.
y_sol = np.linalg.solve(A, b)
Numpy has a function to compare matrices/arrays and see if they are at a practical level the same, allowing for differences coming from small errors that happen due to the way the computer makes calculations (floating point errors):
np.allclose(y, y_sol)
True
Now we can see our solution
plt.plot(np.arange(T)+1, y)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
We can see that the series goes towards a steady-state. Can we find this value analytically? Let us say that we call the steady-state $y^*$. Then by definition of steady-state, $y_{t-2} = y_{t-1} = y_{t} = y^*$. So our difference equation becomes:
$$y_t = a_0 + a_1 y_{t-1} + a_2 y_{t-2}$$$$y^* = a_0 + a_1 y^* + a_2 y^*$$Collecting the $y^*$ terms on one side
$$y^* - a_1 y^* - a_2 y^* = a_0$$$$y^*(1-a_1 - a_2) = a_0$$$$y^* = \frac{a_0}{1-a_1-a_2}$$y_star = a0/(1-a1-a2)
y_star
27.027027027027028
If we want to start thinking about statistical time series models, then we also need to introduce a form of randomness, so that our difference equation looks like:
$$y_t = a_0 + a_1 y_{t-1} + a_2 y_{t-2} + u_t$$where $u_t$ are random numbers from a normal distribution, which we write
$$u_t \sim N(0, \sigma^2_u)$$We can then define a vector of values from that distribution:
$$u = \begin{bmatrix}u_1 \\ u_2 \\u_3 \\ \dots \\u_T \end{bmatrix}$$We can then write our system of equations as
$$Ay = b + u$$And we could write our solution as:
$$y = A^{-1}(b+u)$$(though again, for large matrices, directly calculating the inverse is usually inefficient)
We'll let our random variable, $u$ have mean 0 (since it is an error term) and standard deviation of 2:
sigma_u = 2
u = np.random.normal(0, sigma_u, T)
y = np.linalg.solve(A, (b+u))
plt.plot(np.arange(T)+1, y)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
N = 100
for i in range(N):
col = cm.Greens(np.random.rand()) # Choose a random color from greens
#https://matplotlib.org/stable/tutorials/colors/colormaps.html
u = np.random.normal(0, sigma_u, size=T)
y = np.linalg.solve(A,(b + u))
plt.plot(np.arange(T)+1, y, lw=0.5, color=col)
plt.xlabel('t')
plt.ylabel('y')
plt.show()
In this assignment we can start by downloading data on BNP growth from the US from the Federal Reserve FRED database:
import pandas as pd
bnpGrowth = pd.read_csv("https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1138&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=A191RP1Q027SBEA&scale=left&cosd=1947-04-01&coed=2023-01-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Quarterly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2023-05-21&revision_date=2023-05-21&nd=1947-04-01")
bnpGrowth
DATE | A191RP1Q027SBEA | |
---|---|---|
0 | 1947-04-01 | 4.7 |
1 | 1947-07-01 | 6.0 |
2 | 1947-10-01 | 17.3 |
3 | 1948-01-01 | 9.6 |
4 | 1948-04-01 | 10.7 |
... | ... | ... |
299 | 2022-01-01 | 6.6 |
300 | 2022-04-01 | 8.5 |
301 | 2022-07-01 | 7.7 |
302 | 2022-10-01 | 6.6 |
303 | 2023-01-01 | 5.1 |
304 rows × 2 columns
bnpGrowth.columns = ['date', 'BNP_growth']
We can take a look at the series
bnpGrowth.set_index("date").plot()
<Axes: xlabel='date'>
a.) With this GDP data, estimate a model with 4 lags (an AR(4) model) using OLS (see lab on OLS).
b.) Take the results from OLS and use the above methodology to simulate a 4th order difference equation (use the estimated standard deviation of the regression as the parameter for uncertainty.)
c.) Plot 100 simulations together with the actual GDP data. Would you say that the model represents the real data fairly well (mean values, variability?). Why?
You can start by creating lagged variables as below:
bnpGrowth["BNP_growth_l1"] = bnpGrowth["BNP_growth"].shift(1)
bnpGrowth["BNP_growth_l2"] = bnpGrowth["BNP_growth"].shift(2)
bnpGrowth["BNP_growth_l3"] = bnpGrowth["BNP_growth"].shift(3)
bnpGrowth["BNP_growth_l4"] = bnpGrowth["BNP_growth"].shift(4)
bnpGrowth["const"] = 1
y = np.array(bnpGrowth.loc[:,'BNP_growth'])
X = np.array(bnpGrowth.loc[:,['const', 'BNP_growth_l1', 'BNP_growth_l2', 'BNP_growth_l3', 'BNP_growth_l4']])
X
array([[ 1. , nan, nan, nan, nan], [ 1. , 4.7, nan, nan, nan], [ 1. , 6. , 4.7, nan, nan], ..., [ 1. , 8.5, 6.6, 14.3, 9. ], [ 1. , 7.7, 8.5, 6.6, 14.3], [ 1. , 6.6, 7.7, 8.5, 6.6]])
T = bnpGrowth.shape[0]
T
304