In economics and business we are often interested in rates of change: Economic growth, return on assets, etc. Mathematically, this often involves obtaining the derivative of a function.
You should be familiar with differentiation from previous courses, but here is a summary of some basic techniques.
a.) The demand function for a commodity with price P is given by the formula:
$$D(p) = a-bP$$Find $\frac{dD(P)}{dP}$
(From EMEA exercise 6.2.3, see back of book for solution)
b.) Find the derivatives of the following:
i.) $3x^{11}$
ii.) $-4x^{-7}$
iii.) $\frac{-2}{x^2}$
iv.) $\frac{-2}{x\sqrt{x}}$
(From EMEA exercise 6.6.3)
Differentiate the following functions w.r.t x:
i.) $(2x^2-1)(x^4-1)$
ii.) $\frac{\sqrt{x}-2}{\sqrt{x}+1}$
(From EMEA exercise 6.7.2.b and 6.7.4.)
a.) Find $\frac{dY}{dt}$ when $Y=-3(V+1)^5$ and $V=\frac{1}{3}t^3$
(From EMEA exercise 6.8.2.a)
b.) Find the derivate of $y=x^3e^x$
(From EMEA exercise 6.10.1.f)
Find the second derivatives of:
$$y=3x^3 + 2x-1$$And the third derivative of
$$z=120t-(1/3)t^3$$(EMEA exercise 6.9.3)
For the equation $x^2y=1$, find $dy/dx$ and $d^2y/dx^2$ by implicit differentiation. Check by solving the equation for y and then differencing. (EMEA exercise 7.1.2)
For relatively simple functions, we can determine the derivative analytically, as we have shown for a hand-full of cases above. But in many realistic situations, our function will be too complex or ill-defined to get an analytic solution. In these cases, we may need to approximate derivatives using computational methods. We will explore a few techniques for doing this in this section
When we take the derivative of a funciton, we often have the assumption that the function we are taking the derivative of is continuous: It is defined for any real number: 3.3, 4.893, whatever.
But computers don't work that way - computers are discreet. So in order to do numeric differentiation (which is what we say when we ask a computer to do it), then we need to take a function and break it up into discrete bits, which we call discretisation. Another way of stating this is that we split a function (in 1 dimension, 2 dimensions, etc) into a grid of discrete values. We can then evaluate the function at each of these values.
One intuitive approach to approximating a derivative is a difference approximation. Basically, we take a function and discretise it into small chunks, then take the difference from one point to the other dividing by the length to get an approximate slope at a point. The backward difference is defined as:
$$f'(x) \approx \frac{f(x_j)-f(x_{j-1})}{h}$$And forward difference as
$$f'(x) \approx \frac{f(x_{j+1})-f(x_{j})}{h}$$PPNM Ch. 20.2 gives a more thorough mathematical explanation.
import numpy as np
import matplotlib.pyplot as plt
First we'll define our x-range from 0 to $2\pi \approx 6.28$. We will discretize this range into chunks (or a grid) that are h=.1 apart:
# step size
h = 0.1
# define grid
x = np.arange(0, 2*np.pi, h)
x
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2])
Then we will define our function as y=cos(x)
y = np.cos(x)
y
array([ 1. , 0.99500417, 0.98006658, 0.95533649, 0.92106099, 0.87758256, 0.82533561, 0.76484219, 0.69670671, 0.62160997, 0.54030231, 0.45359612, 0.36235775, 0.26749883, 0.16996714, 0.0707372 , -0.02919952, -0.12884449, -0.22720209, -0.32328957, -0.41614684, -0.5048461 , -0.58850112, -0.66627602, -0.73739372, -0.80114362, -0.85688875, -0.90407214, -0.94222234, -0.97095817, -0.9899925 , -0.99913515, -0.99829478, -0.98747977, -0.96679819, -0.93645669, -0.89675842, -0.84810003, -0.79096771, -0.7259323 , -0.65364362, -0.57482395, -0.49026082, -0.40079917, -0.30733287, -0.2107958 , -0.11215253, -0.01238866, 0.08749898, 0.18651237, 0.28366219, 0.37797774, 0.46851667, 0.55437434, 0.63469288, 0.70866977, 0.77556588, 0.83471278, 0.88551952, 0.92747843, 0.96017029, 0.98326844, 0.9965421 ])
fig, ax = plt.subplots()
ax.plot(x,y)
[<matplotlib.lines.Line2D at 0x7fd87833a940>]
We can then compute a vector of (forward) differences
forward_diff = np.diff(y)/h
forward_diff
array([-0.04995835, -0.14937587, -0.24730089, -0.34275495, -0.43478432, -0.52246947, -0.60493428, -0.68135478, -0.75096741, -0.81307662, -0.86706184, -0.91238367, -0.94858926, -0.97531686, -0.99229941, -0.99936724, -0.99644972, -0.983576 , -0.96087472, -0.9285727 , -0.88699268, -0.83655013, -0.77774904, -0.71117694, -0.637499 , -0.55745138, -0.47183389, -0.38150199, -0.28735824, -0.19034331, -0.09142654, 0.00840374, 0.10815006, 0.20681577, 0.30341505, 0.39698271, 0.48658385, 0.5713232 , 0.65035408, 0.72288683, 0.78819674, 0.84563125, 0.89461649, 0.93466302, 0.96537071, 0.98643272, 0.99763863, 0.99887647, 0.99013386, 0.97149816, 0.94315557, 0.90538929, 0.85857665, 0.8031854 , 0.73976898, 0.66896104, 0.59146906, 0.50806732, 0.41958914, 0.32691856, 0.23098152, 0.13273659])
Since we lose one one point on the grid when we take the difference, then we need to adjust the x grid
x_diff = x[:-1]
x_diff
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1])
The -1 in the index signifies that we move leftward from the last point. This is equivalent to x[:62] (63 points in the grid)
fig, ax = plt.subplots()
ax.plot(x_diff, forward_diff)
[<matplotlib.lines.Line2D at 0x7fd888421220>]
As it happens, y=cos(x) is a function with a well-defined derivative: $y'=-sin(x)$. So we can look at the exact solution
exact_solution = -np.sin(x_diff)
fig, ax = plt.subplots()
ax.plot(x_diff, forward_diff, label="approximation", color="red")
ax.plot(x_diff, exact_solution, label="exact", color="blue")
plt.legend()
<matplotlib.legend.Legend at 0x7fbf587a82b0>
We can also do a check on the error by calculating the max error between the real and approximate solution
# Compute max error between
# numerical derivative and exact solution
max_error = max(abs(exact_solution - forward_diff))
print(max_error)
0.049984407218554114
We see that we get an error here, which is a result of the distance of the discrete step taken. We can decrease this error by decreasing the step size, h.
Change the step size to .01. What happens with the maximum error?
Let's say that we want to estimate a simple linear relationship for which we have some data. We know with certainty that the intercept is 0 and we also know that the standard deviation of our error term is 1, but we are uncertain of the slope parameter which we wish to estimate. We start by generating some fake data:
import numpy as np
import matplotlib.pyplot as plt
n=50
alpha = 0
beta = .8
#create 50 data points from linear model alpha + beta + epsilon where epsilon is normally distributed with sigma = 1
rng = np.random.default_rng()
x = np.linspace(0, 1, n)
y = alpha + beta*x + rng.normal(0, 1, n)
#plot data together with our true linear model
plt.plot(x, y, 'o')
plt.plot(x, alpha + beta*x, 'r-')
plt.show()
There are several possible ways of estimating the line - we will look more closely at least squares later when we cover linear algebra. Right now we will use maximum likelihood. The idea here is that assuming the error in the data has a certain distribution (normal in this case), we estimate the parameter that that makes the data the most likely.
Here is the equation for a normal distribution:
$$f(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{(x-\mu)^2}{2 \sigma^2}}$$def normalFunction(x, mu=0, sigma=1):
f = 1/(np.sqrt(2*np.pi))*np.exp(-(x-mu)**2/(2*sigma**2))
return f
#create a grid of x values between -2 and 2
h=.01
xs = np.arange(-4, 4, h)
fx = normalFunction(xs)
#plot x vs fx
plt.plot(xs, fx)
plt.show()
Now we can write our maximization equation for our 50 data points, $y_i$ and inserting our linear equation for the mean of each point, $\mu_i = \beta x_i$:
$$ \max_{\beta} l(\beta) = \prod_i \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{(y_i-\beta x_i)^2}{2 \sigma^2}}$$If we take the log transform of this we get:
$$ \max_{\beta} l(\beta) = \sum_i log((\frac{1}{\sigma \sqrt{2 \pi}})*(- \frac{(y_i-\beta x_i)^2}{2 \sigma^2}))$$Distributing the log term and summing over constant terms gives
$$ \max_{\beta} logl(\beta) = -n log (\sigma) - \frac{n}{2}log(2 \pi) - \frac{1}{2 \sigma^2} \sum_i^n (y_i - (\beta x_i))^2$$Since we have gotten rid of the exponent this makes this equation a bit easier to work with, a log transformation will also transform multiplication to addition, thus we have an additive summation rather than an original multiplicative summation. The transformation will not affect our final answer since the log transformation is monotonic, which just means the maximum value of the original function will also give the maximum value of the transformed function.
Let's code in a function for our log likelihood function
def loglikFunction(Y, X, beta, n, sigma):
#Y and X are vectors
#n is the number of observations
#sigma is the standard deviation of the error term
#compute log likelihood
loglik = - n*np.log(sigma) -n/2*np.log(2*np.pi) - 1/(2*sigma**2)*np.sum((Y - X*beta)**2)
return loglik
# test
loglikFunction(y, x, .7, n, 1)
-71.148829064554
# create a grid of beta values between 0 and 2 with h=.01 between each value
betas = np.arange(0, 2, h)
# create a vector to store the log likelihood values
loglik = np.zeros(len(betas))
# loop through all the values of beta and compute the log likelihood
for i in range(len(betas)):
loglik[i] = loglikFunction(y, x, betas[i], n, 1)
#create a plot of log likelihood vs beta
plt.plot(betas, loglik)
plt.show()
We can now do a numeric differentiation of our log-likelihood function and see where it crosses zero:
d_loglik = np.diff(loglik)/h
betas_diff = betas[:-1]
fig, ax = plt.subplots()
ax.plot(betas_diff, d_loglik)
#draw horizontal line at 0
ax.axhline(y=0, color='r', linestyle='-')
plt.show()
We can confirm our results and get a more precise numerical answer to the parameter that maximizes our log-liklihood by using optimization tools in scipy.
Here we use the minimize_scalar
import scipy.optimize as opt
#find the maximum of the log likelihood function
result = opt.minimize_scalar(lambda b: -loglikFunction(y, x, b, n, 1), bounds=(0, 2), method='bounded')
print(result)
message: Solution found. success: True status: 0 fun: 70.66298921039113 x: 0.9402329821373814 nit: 6 nfev: 6
We can verify our results by using a regression routine in the package statsmodels. Here we use the GLM (Generalized Linear Model) routine, which uses maximum likelihood, though in this case using OLS should also provide the same
import statsmodels.api as sm
mod1 = sm.GLM(y,x).fit()
mod1.summary()
Dep. Variable: | y | No. Observations: | 50 |
---|---|---|---|
Model: | GLM | Df Residuals: | 49 |
Model Family: | Gaussian | Df Model: | 0 |
Link Function: | Identity | Scale: | 1.0088 |
Method: | IRLS | Log-Likelihood: | -70.661 |
Date: | Wed, 26 Jul 2023 | Deviance: | 49.432 |
Time: | 09:54:32 | Pearson chi2: | 49.4 |
No. Iterations: | 3 | Pseudo R-squ. (CS): | 0.05235 |
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
x1 | 0.9402 | 0.245 | 3.841 | 0.000 | 0.460 | 1.420 |
i.) $y=x^5-x^{-5}$
ii.) $y=x^{-1} + x^{-2}$
iii.) $h(y) = y(y-1)(y+1)$
iv.) $G(t) = \frac{2t+1}{t^2 + 3}$
vi.)$y=\sqrt{\frac{1}{x}-1}$
vii.) $y=e^xln(x^2+2)$
(From EMEA Ch. 6 review exercises 3, 7,9, 14)
a. Suppose a function for the profits of a firm can be written as, $\pi (Q) = QP(Q) - cQ$, where P is a differentiable (inverse demand) function and c is a constant. Find an expression for $\frac{d \pi}{dQ}$
b. Let $P=10/ln(Q)$ and that c=2. Can you write expressions for the marginal revenue and marginal cost for the product? What condition signals the quantity, Q, that maximizes profit? Why?
(EMEA Ch. 6 Review exercise 16)
Say that an inverse demand function has the form:
$$P(Q) = e^{\frac{Q^2}{Q + Q^2}}$$And the cost function is of the form
$$C(Q) = Qln(Q)$$a. Write a function for total profits
b. Discretize the profit function and show on a plot between Q=.1 and Q=20
c. Approximate a numerical solution to the derivative of the profit function. Show this function on a plot, again between Q=.1 and Q=20.
d. Find the approximate solution to where profits are maximized
Our maximum likelihood estimation in the lab assumed that our error terms followed a roughly normal distribution and that the model relationship was linear. But we aren't limited to these assumption when estimating with maximum likelihood. The Generalized Linear Model paradigm also includes estimation of several other types of data. A common form of data that is modeled is binary data (0,1), where we often use logistic regression. Another somewhat less used form of regression (also within the GLM paradigm) is a Poisson regression model.
A Poisson regression model is used to model discrete count data. In practice this will often be in the form of rates. For example, if we are analysing the number of customers waiting in a service line at a given time, or the number of times that an oil company drills in a year.
We can write our general Poisson regression model as:
$$y_i \sim Poisson(\theta_i)$$$$Poisson(\theta) = Pr(X=k) = \frac{\theta^k e^{-\theta}}{k!}$$$$\theta_i = e^{(X_i \beta)}$$Reading the above equations, we would say that the $y_i$ data (which is discrete count data (1,2,3...)) is Poisson distributed. In the line below, the probability mass function (pmf) is written out for the Poisson. This says that given a theta, we can calculate the probability of a count, k, occurring.
Notice that in the Poisson model, we only have a single parameter, $\theta$ (rather than 2 in the normal distribution) that fully describes the distribution (both mean and variance) and we can model the data as the exponent of a set of linear explanatory variables which are in $X_i$.
Below we'll simulate some data from a Poisson distribution. To make it concrete, let's say that this data represents the number of exploratory well bores that an oil company makes in a year, and that we assume that this will be positively affected by the oil price.
rng = np.random.default_rng()
x = np.linspace(10,100, 100)
#x = rng.normal(50,20, 100) #let x be the price of oil
beta = .01 #(since this is a small value) we can interpret
#roughly as that if the oil price increases by 1, then the probability of drilling will increase by 5%
Now we'll generate data from a poisson distribution. We'll start by generating 1000 draws given that $\theta=3$ just to picture what the distribution looks like
counts = rng.poisson(3,1000)
plt.hist(counts)
(array([ 47., 159., 214., 235., 144., 108., 62., 20., 8., 3.]), array([0. , 0.9, 1.8, 2.7, 3.6, 4.5, 5.4, 6.3, 7.2, 8.1, 9. ]), <BarContainer object of 10 artists>)
Now let's simulate the drill data
thetas = np.exp(beta*x)
drills = rng.poisson(thetas)
drills
array([1, 0, 0, 0, 0, 0, 2, 0, 1, 2, 3, 1, 0, 3, 3, 0, 4, 2, 1, 0, 0, 1, 1, 0, 1, 2, 2, 0, 4, 1, 1, 0, 3, 1, 3, 1, 0, 2, 1, 1, 0, 0, 0, 3, 1, 2, 0, 0, 1, 4, 1, 4, 1, 4, 2, 3, 1, 3, 1, 2, 1, 1, 5, 1, 4, 2, 4, 1, 4, 2, 4, 1, 4, 1, 1, 3, 2, 2, 1, 4, 0, 3, 2, 4, 1, 4, 1, 3, 0, 2, 2, 1, 1, 1, 5, 3, 4, 4, 1, 2])
Using maximum likelihood, estimate the $\beta$ parameter from the simulated data assuming that the data, $y_i$ is Poisson distributed (Hint: Instead of writing out your own function for the Poisson probability distribution for the likelihood function, you can use the built in function scipy.stats.poisson.pmf()
from scipy.stats import poisson
#so for example, we can calculate the probability of 3 drills per year
#given that theta=1 as
poisson.pmf(3,1)
0.06131324019524039