Solution sketch: Determinants and inverses¶

4.) Leontif model¶

In matrix terms, we can write the leontif model as $\mathbf{x = Ax + b}$

Which is equivalent to $\mathbf{(I_n - A)x = b}$

In [ ]:
import numpy as np
from scipy import linalg
In [ ]:
A = np.array([[.1,.2,.1],[.3,.2,.2],[.2,.2,.1]])
b = np.array([85, 95, 20])
I = np.eye(3)

I_minus_A = I-A
I_minus_A_inv = linalg.inv(I_minus_A)


np.dot(I_minus_A_inv, b)
Out[ ]:
array([150., 200., 100.])

5.) Explaining Multicolinearity¶

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# import random number generator
rng = np.random.default_rng()
In [ ]:
x1 = rng.normal(2, 1, 100)
x2 = rng.normal(3,1.5,100)
y = 5 + .8*x1 + .3*x2 + rng.normal(0,1, 100)
x3 = .6*x1 - .5*x2 

X = np.array([np.ones(100), x1, x2, x3])
X = X.T

a.) Run a regression using statsmodels where we run an OLS regression of $y$ on the variables in the matrix $\mathbf{X}$. Are there problems with the results?

b.) As we will see later in another lab, in order to calculate an OLS regression, we need to calculate the term $\mathbf{X^TX}^{-1}$. Is the $\mathbf{X}$ matrix we created invertible?

c.) Go back and add a random component to the simulated x3 data. Redo a.) and b.) What has happened?

In [ ]:
import statsmodels.api as sm

mod1 = sm.OLS(y, X).fit()
mod1.summary()
Out[ ]:
OLS Regression Results
Dep. Variable: y R-squared: 0.444
Model: OLS Adj. R-squared: 0.433
Method: Least Squares F-statistic: 38.74
Date: Mon, 17 Apr 2023 Prob (F-statistic): 4.30e-13
Time: 15:34:16 Log-Likelihood: -135.09
No. Observations: 100 AIC: 276.2
Df Residuals: 97 BIC: 284.0
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 5.1621 0.272 18.948 0.000 4.621 5.703
x1 0.6120 0.076 8.045 0.000 0.461 0.763
x2 0.3877 0.054 7.134 0.000 0.280 0.496
x3 0.1733 0.040 4.369 0.000 0.095 0.252
Omnibus: 1.577 Durbin-Watson: 1.786
Prob(Omnibus): 0.455 Jarque-Bera (JB): 1.037
Skew: -0.123 Prob(JB): 0.595
Kurtosis: 3.434 Cond. No. 1.34e+16


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.03e-30. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

We should notice that compared to the true model (which we created), the inference here is incorrect. x3, for example, is not actually an exogenous factor.

Let's go on and calculate the term $\mathbf{X^TX}$

In [ ]:
XTX = np.dot(X.T,X)

We'll check if it has a determinant:

In [ ]:
np.linalg.det(XTX)
Out[ ]:
193068139.2239339

We calculate the determinant of the matrix, which is calculated to be essentially 0 (other than a floating point error.) This also means that the matrix is not invertible. Though we can try anyways and see what happens

In [ ]:
np.linalg.inv(np.dot(X.T,X))
Out[ ]:
array([[ 0.09878605, -0.02027596, -0.01595757, -0.00260832],
       [-0.02027596,  0.01397327, -0.00338193, -0.00527707],
       [-0.01595757, -0.00338193,  0.00819161,  0.00535255],
       [-0.00260832, -0.00527707,  0.00535255,  0.00927288]])

What we see are hugely inflated values in this matrix when we try to invert a matrix that in reality is not invertible. It turns out that this matrix is used in estimating the standard errors of the model, and this is why multicollinearity can result in inference being wrong.

Now let`s get rid of the perfect multicolinearity by just adding some error to x3

In [ ]:
x1 = rng.normal(2, 1, 100)
x2 = rng.normal(3,1.5,100)
y = 5 + .8*x1 + .3*x2 + rng.normal(0,1, 100)
x3 = .6*x1 - .5*x2 + rng.normal(0,1, 100)

X = np.array([np.ones(100), x1, x2, x3])
X = X.T
In [ ]:
XTX = np.dot(X.T,X)
np.linalg.det(XTX)
Out[ ]:
193068139.2239339

We now have a non-zero determinant, and we can invert the matrix

In [ ]:
np.linalg.inv(np.dot(X.T,X))
Out[ ]:
array([[ 0.09878605, -0.02027596, -0.01595757, -0.00260832],
       [-0.02027596,  0.01397327, -0.00338193, -0.00527707],
       [-0.01595757, -0.00338193,  0.00819161,  0.00535255],
       [-0.00260832, -0.00527707,  0.00535255,  0.00927288]])
In [ ]:
mod2 = sm.OLS(y, X).fit()
mod2.summary()
Out[ ]:
OLS Regression Results
Dep. Variable: y R-squared: 0.504
Model: OLS Adj. R-squared: 0.489
Method: Least Squares F-statistic: 32.54
Date: Mon, 17 Apr 2023 Prob (F-statistic): 1.34e-14
Time: 15:36:35 Log-Likelihood: -122.99
No. Observations: 100 AIC: 254.0
Df Residuals: 96 BIC: 264.4
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 5.1831 0.266 19.521 0.000 4.656 5.710
x1 0.7797 0.100 7.808 0.000 0.581 0.978
x2 0.2502 0.076 3.273 0.001 0.098 0.402
x3 -0.0639 0.081 -0.786 0.434 -0.225 0.098
Omnibus: 0.271 Durbin-Watson: 1.942
Prob(Omnibus): 0.873 Jarque-Bera (JB): 0.046
Skew: -0.030 Prob(JB): 0.977
Kurtosis: 3.086 Cond. No. 13.0


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [ ]: