Now we go back to our original motivation of solving a system of linear equations. Previously, we discussed our "ad hoc" method of substituting one equation into another in order to eliminate a variable. In this first of two videos, we will show how we can systematize this method by using matrix notation, in what is called Gaussian elimination. The goal will be a method that gives us a straight-forward step-by-step process for solving any sized system of equations. Importantly, we can also program a computer to do these steps.
Now that we have seen how a solution to a systems of equations can be represented by a upper triangular matrix, we now show how we can use row operations on an augmented coefficient matrix in order to obtain such a matrix.
Here is the Khan Academy video for using Gaussian elimination to solve a system of equations.
Solve the following system by Gaussian elimination
$$x_1 + 2x_2 + x_3 = 4$$$$x_1 - x_2 + x_3 = 5$$$$2x_1 + 3x_2 - x_3 = 1$$(EMEA exercise 12.8.1)
Operationally, vectors are just matrices where one of the dimensions is 1. But they hold a special place in linear algebra, partly because of the clear geometric interpretation we can give them. In the rest of this lab, we spend time discussing this interpretation. We start with a simple 2 dimensional space:
On a 2-dimensional plot, draw the vector $\mathbf{a} = \begin{bmatrix} 5\\-1 \end{bmatrix}$
Vector operations like addition, subtraction and scalar multiplication can also be given straight-forward geometric interpretations:
Here is the Khan Academy lesson on vectors in linear algebra.
let $\mathbf{a} = [5,-1]$ and $\mathbf{b} = [-2,4]$. Compute $\mathbf{a+b}$ and $\mathbf{-\frac{1}{2}a}$, then illustrate geometrically with vectors stating at the origin.
(EMEA Exercise 12.9.1)
In two dimensions, we define our vector with a width (x-axis) and heigh (y-axis). From this information, we can easily calculate the length of the vector (the pythagorean theorem). We can extend this idea to larger dimension vectors. The length of a vector is called the norm.
let $\mathbf{a} = [1,2,2]$, $\mathbf{b}=[0,0,-3]$ and $\mathbf{c}=[-2,4,-3]$. Compute $||a||, ||b|| and ||c||$. Verify that $\mathbf{a}$ and $\mathbf{b}$ satisfy the Cauchy-Schwarz inequality (See EMEA p.484)
Mathematically, orthogonality is a simple concept. In two dimensions, it just means that two vectors are pointed 90 degrees relative to each other. But the concept of orthoganality has major implications in how we use and even interpret mathematical models that make use of linear algebra. For example in econometrics, we use orthogonality in order to help us decide whether a variable can be interpreted as causal.
Check which of the following three pairs of vectors are orthogonal:
a.) [1,2] and [-2,1]
b.) [1,-1,1] and [-1,1,-1]
c.) [a,-b,1] and [b,a,0]
(EMEA Exercise 12.9.7)
To someone who does not study mathematics, a vector and a line can appear to be the same. They are not. A vector has a direction. A line extends infinitely in both its directions. But vectors and lines are related. In fact, we can use two vectors to define any line:
Find the equation for the line:
a.) that passes through the point (3,-2,2) and (10,2,1)
b.) that passes through the point (1,3,2) and has the same direction as the vector $[0,-1,1]$
(EMEA Exercise 12.10.1)
Hint: If you have a point $\mathbf{p}$ and a vector $\mathbf{a}$, then the line, $\mathbf{x}$ that passes through point $\mathbf{p}$ in the direction of $\mathbf{a}$ can be written $\mathbf{x} = \mathbf{p} + t\mathbf{a}$, where t is a scalar. So your answer in (b) will be three equations (for $x_1$, $x_2$, and $x_3$) as a function of the scalar, $t$)
Visualising vectors and lines in two dimensions is relatively straight-forward. But even just going up to 3 dimensions, makes the intuition quite a bit harder. Here we move from defining a 1-dimensional line, to a 2-dimensional plane (imagine a piece of paper floating in the air that extends infinitely from each edge).
The visualisation and intuition becomes tought here, so we try here both with pen-and-paper and my super sophisticated 3D simulator.
If you still are having a hard time with the intuition, don't worry, that's normal.
import numpy as np
from numpy.linalg import norm
We start by defining a row vector, then transposing to make it a column vector
vector_row = np.array([[2,2]])
myVector = vector_row.T
myVector
array([[2], [2]])
To calculate the length (or norm) as we defined above, we can use the norm function from numpy:
norm(myVector, 2)
2.8284271247461903
The "2" paramater refers to the fact that we are calculating the norm by taking the square (2) root of the sum of squared entries in the vector.
If we want to calculate the norm as sum of absolute values we would calculate
norm(myVector,1)
4.0
The following formula defines the angle between two vectors, $v$, and $w$ $$v \cdot w = \Vert v \Vert \Vert w \Vert cos( \theta) $$
Creating two vectors
v = np.array([1,2]).T #try to be consistent in representing vectors as column vectors
w = np.array([3, 1]).T
Recalling that the inverse of cos is arccos ($x=cos(\theta) => arccos(x) = \theta$)
theta = np.arccos(np.dot(v, w.T)/(norm(v,2)*norm(w,2)))
theta
0.7853981633974484
Python returns angles in the form of radians and not degrees. $\pi \equiv 3.1415... \equiv 180^{\circ}$ and $\pi/2 \equiv 90^{\circ}$
In our lecture we said that a test for orthogonal vectors was whether their inner "dot" product was equal to zero. Recall, that the definition of orthogonal vectors is that they point in 90 degree angles relative to each other.
Let's check this.
First, let's design two vectors with inner product =0
v = np.array([1,0,0])
w = np.array([0,2,1])
np.dot(v,w.T)
0
Now let's calculate the angle:
theta = np.arccos(np.dot(v, w.T)/(norm(v,2)*norm(w,2)))
theta
1.5707963267948966
which is $\pi/2 \equiv 90^{\circ}$
np.pi/2
1.5707963267948966
Remember all the work required to solve even a 3x3 system of equations by hand? Well, this will either make you happy or frustrated over all the wasted time:
import numpy as np
A = np.array([[4, 3, -5],
[-2, -4, 5],
[8, 8, 0]])
y = np.array([2, 5, -3])
x = np.linalg.solve(A, y)
print(x)
[ 2.20833333 -2.58333333 -0.18333333]
behind the scences, the algorithm being used is not actually the Gaussian elimination we did by hand (but that could easily be programmed in). Instead there is something called LU decomposition, which is more efficient (quicker). We won't cover this algorithm in this course, but you can read more about it here
Solve each of the following equation systems by Gaussian elimination
a.)
$$x_1 + 4x_2 = 1$$$$2x_1 + 2x_2 = 8$$b.)
$$2x_1 + 2x_2 - x_3 = 2$$$$x_1 - 3x_2 + x_3 = 0$$$$3x_1 + 4x_2 - x_3 = 1$$c.)
$$x_1 + 3x_2 + 4x_3=0$$$$5x_1 + x_2 + x_3 = 0$$(EMEA Review exercise 12.7)
Let $\mathbf{a} = [-1,5,3]$, $\mathbf{b} = [1,1,-3]$ and $\mathbf{c} = [-1,2,8]$. Compute $||a||$, $||b||$ and $||c||$. Then verify the Cauchy-Schwarz inequality holds for a and b.
(EMEA Review exercise 12.9)
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from matplotlib import cm
from scipy.interpolate import interp2d
from scipy.linalg import inv, solve, det, eig
fig, ax = plt.subplots(figsize=(10, 8))
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.set(xlim=(-5, 5), ylim=(-5, 5))
ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()
See if you can draw a similar figure for the vectors $\mathbf{a} = [3,2]$ and $\mathbf{b} = [1,-2]$, as well as the vector, $\mathbf{c}$ that is formed by $\mathbf{c=a+b}$. First, alter the code above. See also if you can draw a similar figure, but instead of the annotate command, use the *plt.plot() command.
Below is a system of equations
$$x_1 + 2x_3 + 3x_3 = 80$$$$x_1 + 2x_2 + 2x_4 = 100$$$$x_1 + 2x_3 + 2x_4 = 75$$$$2x_1 + x_2 + x_3 + 3x_4=100$$Put this system in matrix form and solve numerically.
In an OLS regression, the predicted values (or "fitted" values) from a regression model will be orthogonal to the residuals of the model.
If we simulate data as below for a model $y=\alpha + \beta_1 x + \beta_2 z + e$, where $e\sim N(0,1)$
import numpy as np
#initiate a random number generator
rng = np.random.default_rng()
#generate 100 random number from a N(3,2) distribution
x = rng.normal(3, 2, 10000)
zeta = .5
z = zeta*x + rng.normal(0, 1, 10000)
e = rng.normal(0, 1, 10000)
beta1 = .3
beta2 = .2
alpha = 2
y = alpha + beta1*x + beta2*z + e
Now we can use statsmodels to estimate an incorrectly specified model with only x in the ols regression model
#import sm
import statsmodels.api as sm
mod1 = sm.OLS(y, sm.add_constant(x)).fit()
#the "add_constant()" command simply says we also want to estimate an intercept term.
#we save the residuals and fitted values for later use
resids1 = mod1.resid
fitted1 = mod1.fittedvalues
mod1.summary()
Dep. Variable: | y | R-squared: | 0.385 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.384 |
Method: | Least Squares | F-statistic: | 6246. |
Date: | Fri, 14 Apr 2023 | Prob (F-statistic): | 0.00 |
Time: | 15:23:59 | Log-Likelihood: | -14337. |
No. Observations: | 10000 | AIC: | 2.868e+04 |
Df Residuals: | 9998 | BIC: | 2.869e+04 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 1.9937 | 0.018 | 108.904 | 0.000 | 1.958 | 2.030 |
x1 | 0.4000 | 0.005 | 79.031 | 0.000 | 0.390 | 0.410 |
Omnibus: | 1.695 | Durbin-Watson: | 2.005 |
---|---|---|---|
Prob(Omnibus): | 0.429 | Jarque-Bera (JB): | 1.694 |
Skew: | -0.013 | Prob(JB): | 0.429 |
Kurtosis: | 2.942 | Cond. No. | 6.88 |
Here we see the result we know from taking a multivariate statistics course. If we have a z variable that is not included in the model and which is correlated with both the x variable and the y variable then we get a biased estimate of the coefficient on the x variable. Here we get an estimate of .4, where the true value was really .3.
Estimate the correctly specified model with both x and z as explanatory variables (remember to include a constant term). Save the residual and fitted values in objects. Show that in both the correctly and incorrectly specified models, the vectors consisting of the residuals and fitted values are orthogonal to each other. Can you provide intuition for this?