You should have seen integration and some basic analytical methods for solving indefinite and definite integrals previously. Most of this material will therefor be review.
There are many more advanced techniques for getting analytic solutions (equations) to integrals. For this I will refer to any one of many textbooks on the subject. In practice, however, most models of moderate complexity will not have an analytic solution but instead need to be found numerically. So the majority of new material on this topic will be to learn some numerical approximation techniques to finding the integral of a function.
Find the following (indefinite) integrals:
1.) $\int x^{13} dx$
2.) $ \int \frac{1}{\sqrt{x}} dx$
3.) $\int 2^x dx$
(EMEA Exercise 10.1.1)
Compute the area bounded by the graph of each of the following functions over the indicated intervals
a.) $f(x) = 3x^2$, in [0,2]
b.) $f(x) = e^x$, in [-1,1]
(EMEA Exercise 10.2.2)
Just like with derivatives, we can find the analytic integral of some basic functions. But again for many practical applications, it is difficult or impossible to find an analytic solution to many functions.
Our approach to numerical differentiation was quite simple, we divided up the range of a function into discrete chunks - a grid, then estimated the slopes between points of the function. When we visualize integration in two (or three dimensions), we are calculating the area under the curve of a function.
When we try to use numeric approximations to integrals, we then again break up the range into discrete chunks, estimate the area under the function in those discrete chunks, then sum up to get our estimated integral values.
PPNM provides a more mathematical problem statement.
The simplist way of approximating the integral of a function is to calculate the area of a rectangle at each point in the grid and then sum over the range, called a Riemann integral. Mathematically, we can write this as:
$$\int_a^b f(x)dx \approx \sum_{i=0}^{n-1} h f(x_i)$$More mathematical details can be found in PPNM Ch. 21.2
Here is the Khan Academy material on Riemann integrals.
Let's jump right to the computation
Let us say that we have a function defined as:
$$f(x) = sin(x)$$and we want to evaluate the integral on the range from 0 to $\pi$
import numpy as np
import matplotlib.pyplot as plt
a = 0 #start point
b = np.pi #end point
n = 11 #the number of rectangles
h = (b - a) / (n - 1) #define our grid
x = np.linspace(a, b, n)
f = np.sin(x) #define our function over that grid
print(f)
[0.00000000e+00 3.09016994e-01 5.87785252e-01 8.09016994e-01 9.51056516e-01 1.00000000e+00 9.51056516e-01 8.09016994e-01 5.87785252e-01 3.09016994e-01 1.22464680e-16]
Above we define our function and some of our parameters.
The function we are approximating the integral of is the sine curve: $y=sin(x)$
n=11 signifies that we break up our range (x values) into 10 discrete units (between 11 points).
We are calculing this integral between $a=0$ and $b=\pi$
To calculate the width of each grid point, h, we take $h=\frac{(b-a)}{(n-1)} = \frac{\pi}{10}$
Now we can calculate three versions of the Riemann sum:
The object f now holds a list of the heights of the function $sin(x)$ at the 11 points on our grid.
fig, ax = plt.subplots()
ax.bar(x, f)
plt.show()
Left Riemann sum, where the height of the rectangles is calculated from the start point of the grid unit.
I_riemannL = h * sum(f[:n-1])
I_riemannL
# or equivalently
I_riemannL = h * sum(f[:-1])
I_riemannL
1.9835235375094546
Right Riemann sum, where the height of the rectangles is calculated from the end point of the grid unit.
I_riemannR = h * sum(f[1:])
I_riemannR
1.9835235375094546
As it turns out, for the sine function, these two are equivalend.
Now we try the third, and perhaps best method, which is to calculate the height at the midpoint of each interval
I_mid = h * sum(np.sin((x[:-1] \
+ x[1:])/2))
I_mid
2.0082484079079745
The sin(x) curve is well-defined and we know that the integral is:
$$\int sin(x) = -cos(x)$$So we can find the true value of our integral:
np.pi
3.141592653589793
true_int = -np.cos(np.pi) - -np.cos(0)
true_int
2.0
We can then calculate our errors
err_riemannL = true_int - I_riemannL
err_riemannR = true_int - I_riemannR
err_mid = true_int - I_mid
print(err_riemannL)
print(err_riemannR)
print(err_mid)
0.01647646249054535 0.01647646249054535 -0.008248407907974542
In terms of magnitudes, clearly the the mid-point rule (here) gives the best approximation.
Now change the number of units on the grid to be 100 (n=101). How does this change the error for the three different riemann sums. How does the relative error between the middle Riemann og left/right Riemann change? That is, now left/right have errors that are twice that of mid. Does this stay constant with a finer grid?
A more sophisticated way of numerically approximating integrals is by calculating the area of a trapezoid at each discrete unit. See PPNM Ch. 21.3 for further mathematical explanation.
We jump straight to the calculation, again using $f(x) = sin(x)$ as our example.
We set up our problem exactly as before:
a = 0
b = np.pi
n = 11
h = (b - a) / (n - 1)
x = np.linspace(a, b, n)
f = np.sin(x)
But now, instead of a rectangle with a "flat" top, we calculate an area based on a "sloped" top on each unit based on the heights at the two points:
A simple numerical example would be if we had only one unit between two points, the width between the two points is h=3 and the heights at the 2 points were 5 and 10 then using the formula for the area of a trapezoid/06%3A_Area_and_Perimeter/6.05%3A_The_Area_of_a_Trapezoid):
$$A = \frac{1}{2} h *(b_1 + b_2)$$Where $b_1$ and $b_2$ are the heights at the two points of the unit. We then get
$$3*5 + 1/2*3*(10+5)$$Extending this to summing over multiple trapezoids we get
I_trap = (h/2)*(f[0] + 2 * sum(f[1:n-1]) + f[n-1])
I_trap
1.9835235375094546
err_trap = 2 - I_trap
err_trap
0.01647646249054535
Interestingly, in this case, we don't really get an improvement in the error compared to the Riemann sums
Compute the integral of the function $f(x) = e^{sin(x)}$ over the range $(0,2\pi)$ using the trapezoid rule. Use a grid with 100 units. Compare the error to a Riemann sum with the midpoint rule.
The final approach we will use to numerically approximating the numerical integral. In this approach, instead of modelling the "roof" of each unit as a straight line (either flat as in Riemann Sums or sloped as with Trapezoid Sums), we now model the roofs by estimating polynomials through subintervals. For further description and mathematical details see [PPNM Ch. 21.4].(https://pythonnumericalmethods.berkeley.edu/notebooks/chapter21.04-Simpsons-Rule.html).
You can see the link for the full derivation, but here I will just reproduce the final result which is that you can approximate the integral across two units: $x_{t-1}$ to $x_{t}$ and $x_{t}$ to $x_{t+1}$ by:
$$\int_{x_{i-1}}^{x_{i+1}} \approx \frac{h}{3}(f(x_{i-1}) + 4 f(x_i) + f(x_i+1))$$Now we can compute. We use the same set-up as above:
a = 0
b = np.pi
n = 11
h = (b - a) / (n - 1)
x = np.linspace(a, b, n)
f = np.sin(x)
Now we use the rule above, summing over all the units
I_simp = (h/3) * (f[0] + 2*sum(f[:n-2:2])+ 4*sum(f[1:n-1:2]) + f[n-1])
In the above, the indexing f[:n-2:2] means we use the values from the start up-to n-2, and take every second value. We do this because now we are taking the sums over every two units
f[:n-2:2]
array([0. , 0.58778525, 0.95105652, 0.95105652, 0.58778525])
We then compute the error
err_simp = 2 - I_simp
print(err_simp)
-0.00010951731500430384
Here we see that using simpsons rule gives us a much smaller error.
Luckily, we don't need to manually devise our own numeric integration techniques every time we want to evaluate an integral. The Scipy package includes several techniques for numerical integration.
We will define our parameters just one time for the techniques below
a = 0
b = np.pi
n = 11
h = (b - a) / (n - 1)
x = np.linspace(a, b, n)
f = np.sin(x)
Scipy has a built in trapezoid summation:
from scipy.integrate import trapz
I_trapz = trapz(f,x)
I_trapz
1.9835235375094542
We can also compute the cumulative integral, which returns an array of values where each number represents the cumulative integral (or area underneath the curve) at each of the n points
from scipy.integrate import cumtrapz
F_approx = cumtrapz(np.sin(x), x)
F_approx
array([0.04854028, 0.18940964, 0.40881883, 0.68529053, 0.99176177, 1.29823301, 1.57470471, 1.79411389, 1.93498326, 1.98352354])
We can do a similar computation using simpsons rule
from scipy.integrate import simpson
I_simpson = simpson(f,x)
I_simpson
2.0001095173150043
A final alternative method is quad, which returns both an estimate of the integral as well as an estimate of the error. The mathematics of this method are a bit beyond the scope of this course.
from scipy.integrate import quad
I_quad, est_err_quad = \
quad(np.sin, 0, np.pi)
I_quad
2.0
err_quad = 2 - I_quad
print(est_err_quad, err_quad)
2.220446049250313e-14 0.0
We see this is very accurate.
This is just a taste of the integration tools that scipy provides. You can explore more at the scipy.integration website
$\int_{1/2}^{e/2} x^3ln(2x)dx$
(EMEA review exercise 10.4 )
Say that the total inverse demand function for an economy for a certain product can be written:
$$P(Q) = \frac{150}{(ln(Q))}$$And that the total inverse supply function for an economy can be written:
$$P(Q) = \frac{50}{(e^{1/Q})}$$a.) Discretize and plot these two functions between Q=2 and Q=50, with n=1000. (We start at 2 here for convenience since $log(0)=\infty$ and $log(1)=0$)
b.) Find the approximate equilibrium price and quantity
c.) Write the consumer and producer surpluses as integrals. Calculate approximations of the consumer and producer surpluses (from Q=2), using both the trapezoid rule and simpsons rule.
When we are modelling how a new technology or product enters into a model (like smartphones, electric cars, or even a new fashion), a common function we make use of is the logistic function. We can write the logistic function as:
$$f(x) = \frac{L}{1+e^{-k(t_s-t_0)}}$$Here $L$ represents the level of saturation where everyone who wants or can afford a product has the product. $t_0$ represents the midpoint of the diffusion process, where $t_s$ represents the general time variable. $k$ controlls the rate of diffusion.
Let's say we want to model the rate of diffusion of electric cars and we divise a logistic model to represent the sales of electric car in a certain market.
We can say that we are modelling for a period of 5000 days (about 12 years) and we assume that we have sales of about 10,000 cars at any given day We then divise the following logistic model:
T = 5000
ts = np.linspace(0,T, T)
L = 10000
k = .002
t0 = ts.mean()
R = L/(1 + np.exp(-k*(ts-t0)))
fig, ax = plt.subplots()
ax.plot(ts, R)
[<matplotlib.lines.Line2D at 0x7f96b851b250>]
Now we wish to find the total number of electric cars on the road at any given time. This will of course be the area underneath the diffusion curve we created below. Integrate compututationally the logistic curve and show the result on a plot. Can you think of ways to improve this model, that is make it more realistic?
What other process or mechanism could a logistic function represent?
Let us say that we are the CFO for a firm that sells a certain product. We wish to estimate revenues for a product for a following fiscal quarter. We have a known demand curve for our product but prices are stochastic, that is they are uncertain and represented by a probability distribution - in this case a normal distribution.
The demand curve for our product is:
$$Q(P) = 1000/ln(P)$$Notice that restricts our $P>1$ since $ln(P)=0$. We could perhaps interpret this to mean that at P=1 or lower demand is infinite.
Then we will say that our price variable, P, is distributed normally with mean 10 and standard deviation 3:
$$P \sim N(10,3)$$(again with the restriction that P>1)
Create a function for the revenues given a price. Then find the expected revenue by integrating over the probability distribution of prices. How does this expected value differ from just taking the mean price (10) and multiplying by the quantity that comes from putting 10 into the demand function?
(hint, you can use the norm.pdf from scipy.stats to give you the probability density function of a normal distributed stochastic variable)
# import normal pdf
from scipy.stats import norm
#For example the density at 3 would be:
norm.pdf(3, loc=10, scale=3)
0.00874062969790316