Integration and numerical integration¶

MET 430¶

NTNU Business School¶

Johannes Mauritzen¶

You should have seen integration and some basic analytical methods for solving indefinite and definite integrals previously. Most of this material will therefor be review.

There are many more advanced techniques for getting analytic solutions (equations) to integrals. For this I will refer to any one of many textbooks on the subject. In practice, however, most models of moderate complexity will not have an analytic solution but instead need to be found numerically. So the majority of new material on this topic will be to learn some numerical approximation techniques to finding the integral of a function.

Learning goals¶

  • Understand the meaning of indefinite and definite integrals.
  • Analytic solutions for integrals of power functions and exponential functions
  • Review rules of integration
  • Introduction to numerical integration

Literature¶

  • Brilliant.org overview of integration

  • PPNM Ch. 21

  • (EMEA Ch. 10.1-10.5)

Integration (Video 8.1)¶

Here is the Khan Academy introduction to integrals.

A handful of integrals (Video 8.2)¶

Here is the Khan Academy video on finding integrals of power functions.

Here is the Khan Academy video on finding the derivative of $\frac{1}{x}$

Here is the Khan Academy video on finding trigonmetric and exponential integrals.

Try it¶

Find the following (indefinite) integrals:

1.) $\int x^{13} dx$

2.) $ \int \frac{1}{\sqrt{x}} dx$

3.) $\int 2^x dx$

(EMEA Exercise 10.1.1)

Compute the area bounded by the graph of each of the following functions over the indicated intervals

a.) $f(x) = 3x^2$, in [0,2]

b.) $f(x) = e^x$, in [-1,1]

(EMEA Exercise 10.2.2)

Numerical integration approaches¶

Just like with derivatives, we can find the analytic integral of some basic functions. But again for many practical applications, it is difficult or impossible to find an analytic solution to many functions.

Our approach to numerical differentiation was quite simple, we divided up the range of a function into discrete chunks - a grid, then estimated the slopes between points of the function. When we visualize integration in two (or three dimensions), we are calculating the area under the curve of a function.

When we try to use numeric approximations to integrals, we then again break up the range into discrete chunks, estimate the area under the function in those discrete chunks, then sum up to get our estimated integral values.

PPNM provides a more mathematical problem statement.

Riemann integral¶

The simplist way of approximating the integral of a function is to calculate the area of a rectangle at each point in the grid and then sum over the range, called a Riemann integral. Mathematically, we can write this as:

$$\int_a^b f(x)dx \approx \sum_{i=0}^{n-1} h f(x_i)$$

More mathematical details can be found in PPNM Ch. 21.2

Here is the Khan Academy material on Riemann integrals.

Let's jump right to the computation

Let us say that we have a function defined as:

$$f(x) = sin(x)$$

and we want to evaluate the integral on the range from 0 to $\pi$

In [ ]:
import numpy as np
import matplotlib.pyplot as plt

a = 0 #start point
b = np.pi #end point
n = 11 #the number of rectangles
h = (b - a) / (n - 1) #define our grid
x = np.linspace(a, b, n)
f = np.sin(x) #define our function over that grid

print(f)
[0.00000000e+00 3.09016994e-01 5.87785252e-01 8.09016994e-01
 9.51056516e-01 1.00000000e+00 9.51056516e-01 8.09016994e-01
 5.87785252e-01 3.09016994e-01 1.22464680e-16]

Above we define our function and some of our parameters.

The function we are approximating the integral of is the sine curve: $y=sin(x)$

n=11 signifies that we break up our range (x values) into 10 discrete units (between 11 points).

We are calculing this integral between $a=0$ and $b=\pi$

To calculate the width of each grid point, h, we take $h=\frac{(b-a)}{(n-1)} = \frac{\pi}{10}$

Now we can calculate three versions of the Riemann sum:

The object f now holds a list of the heights of the function $sin(x)$ at the 11 points on our grid.

In [ ]:
fig, ax = plt.subplots()
ax.bar(x, f)
plt.show()

Left Riemann sum, where the height of the rectangles is calculated from the start point of the grid unit.

In [ ]:
I_riemannL = h * sum(f[:n-1])
I_riemannL

# or equivalently

I_riemannL = h * sum(f[:-1])
I_riemannL
Out[ ]:
1.9835235375094546

Right Riemann sum, where the height of the rectangles is calculated from the end point of the grid unit.

In [ ]:
I_riemannR = h * sum(f[1:])
I_riemannR
Out[ ]:
1.9835235375094546

As it turns out, for the sine function, these two are equivalend.

Now we try the third, and perhaps best method, which is to calculate the height at the midpoint of each interval

In [ ]:
I_mid = h * sum(np.sin((x[:-1] \
        + x[1:])/2))

I_mid
Out[ ]:
2.0082484079079745

The sin(x) curve is well-defined and we know that the integral is:

$$\int sin(x) = -cos(x)$$

So we can find the true value of our integral:

In [ ]:
np.pi
Out[ ]:
3.141592653589793
In [ ]:
true_int = -np.cos(np.pi) - -np.cos(0)
true_int
Out[ ]:
2.0

We can then calculate our errors

In [ ]:
err_riemannL = true_int - I_riemannL
err_riemannR = true_int - I_riemannR
err_mid = true_int - I_mid

print(err_riemannL)
print(err_riemannR)
print(err_mid)
0.01647646249054535
0.01647646249054535
-0.008248407907974542

In terms of magnitudes, clearly the the mid-point rule (here) gives the best approximation.

Try it¶

Now change the number of units on the grid to be 100 (n=101). How does this change the error for the three different riemann sums. How does the relative error between the middle Riemann og left/right Riemann change? That is, now left/right have errors that are twice that of mid. Does this stay constant with a finer grid?

Trapezoid rule¶

A more sophisticated way of numerically approximating integrals is by calculating the area of a trapezoid at each discrete unit. See PPNM Ch. 21.3 for further mathematical explanation.

We jump straight to the calculation, again using $f(x) = sin(x)$ as our example.

We set up our problem exactly as before:

In [ ]:
a = 0
b = np.pi
n = 11
h = (b - a) / (n - 1)
x = np.linspace(a, b, n)
f = np.sin(x)

But now, instead of a rectangle with a "flat" top, we calculate an area based on a "sloped" top on each unit based on the heights at the two points:

A simple numerical example would be if we had only one unit between two points, the width between the two points is h=3 and the heights at the 2 points were 5 and 10 then using the formula for the area of a trapezoid/06%3A_Area_and_Perimeter/6.05%3A_The_Area_of_a_Trapezoid):

$$A = \frac{1}{2} h *(b_1 + b_2)$$

Where $b_1$ and $b_2$ are the heights at the two points of the unit. We then get

$$3*5 + 1/2*3*(10+5)$$

Extending this to summing over multiple trapezoids we get

In [ ]:
I_trap = (h/2)*(f[0] + 2 * sum(f[1:n-1]) + f[n-1])
I_trap
Out[ ]:
1.9835235375094546
In [ ]:
err_trap = 2 - I_trap
err_trap
Out[ ]:
0.01647646249054535

Interestingly, in this case, we don't really get an improvement in the error compared to the Riemann sums

Try it¶

Compute the integral of the function $f(x) = e^{sin(x)}$ over the range $(0,2\pi)$ using the trapezoid rule. Use a grid with 100 units. Compare the error to a Riemann sum with the midpoint rule.

Simpson's Rule¶

The final approach we will use to numerically approximating the numerical integral. In this approach, instead of modelling the "roof" of each unit as a straight line (either flat as in Riemann Sums or sloped as with Trapezoid Sums), we now model the roofs by estimating polynomials through subintervals. For further description and mathematical details see [PPNM Ch. 21.4].(https://pythonnumericalmethods.berkeley.edu/notebooks/chapter21.04-Simpsons-Rule.html).

You can see the link for the full derivation, but here I will just reproduce the final result which is that you can approximate the integral across two units: $x_{t-1}$ to $x_{t}$ and $x_{t}$ to $x_{t+1}$ by:

$$\int_{x_{i-1}}^{x_{i+1}} \approx \frac{h}{3}(f(x_{i-1}) + 4 f(x_i) + f(x_i+1))$$

Now we can compute. We use the same set-up as above:

In [ ]:
a = 0
b = np.pi
n = 11
h = (b - a) / (n - 1)
x = np.linspace(a, b, n)
f = np.sin(x)

Now we use the rule above, summing over all the units

In [ ]:
I_simp = (h/3) * (f[0] + 2*sum(f[:n-2:2])+ 4*sum(f[1:n-1:2]) + f[n-1])

In the above, the indexing f[:n-2:2] means we use the values from the start up-to n-2, and take every second value. We do this because now we are taking the sums over every two units

In [ ]:
f[:n-2:2]
Out[ ]:
array([0.        , 0.58778525, 0.95105652, 0.95105652, 0.58778525])

We then compute the error

In [ ]:
err_simp = 2 - I_simp
print(err_simp)
-0.00010951731500430384

Here we see that using simpsons rule gives us a much smaller error.

Built-in tools in Python for numeric integration¶

Luckily, we don't need to manually devise our own numeric integration techniques every time we want to evaluate an integral. The Scipy package includes several techniques for numerical integration.

We will define our parameters just one time for the techniques below

In [ ]:
a = 0
b = np.pi
n = 11
h = (b - a) / (n - 1)
x = np.linspace(a, b, n)
f = np.sin(x)

Scipy has a built in trapezoid summation:

In [ ]:
from scipy.integrate import trapz

I_trapz = trapz(f,x)
I_trapz
Out[ ]:
1.9835235375094542

We can also compute the cumulative integral, which returns an array of values where each number represents the cumulative integral (or area underneath the curve) at each of the n points

In [ ]:
from scipy.integrate import cumtrapz

F_approx = cumtrapz(np.sin(x), x)
F_approx
Out[ ]:
array([0.04854028, 0.18940964, 0.40881883, 0.68529053, 0.99176177,
       1.29823301, 1.57470471, 1.79411389, 1.93498326, 1.98352354])

Simpsons rule¶

We can do a similar computation using simpsons rule

In [ ]:
from scipy.integrate import simpson

I_simpson = simpson(f,x)
I_simpson
Out[ ]:
2.0001095173150043

Quad¶

A final alternative method is quad, which returns both an estimate of the integral as well as an estimate of the error. The mathematics of this method are a bit beyond the scope of this course.

In [ ]:
from scipy.integrate import quad 

I_quad, est_err_quad = \
          quad(np.sin, 0, np.pi)
I_quad
Out[ ]:
2.0
In [ ]:
err_quad = 2 - I_quad
print(est_err_quad, err_quad)
2.220446049250313e-14 0.0

We see this is very accurate.

Other integration tools¶

This is just a taste of the integration tools that scipy provides. You can explore more at the scipy.integration website

Assignment¶

1.) Find the following integrals analytically¶

a.) $\int (r-4r^{1/4})dr$

b.) $\int x^2\sqrt{x}$

c.) $\int (e^{-3x} + e^{3x})dx$

(EMEA Review exercises 10.1 and 10.2)

2. Evaluate the following integrals¶

a.) $\int_0^2 (x-\frac{1}{2}x^2)dx$

b.) $\int_0^4 v \sqrt{v^2 +9}dv$

(EMEA review exercises 10.3)

3. Evaluate the following integral using integration by parts¶

$\int_{1/2}^{e/2} x^3ln(2x)dx$

(EMEA review exercise 10.4 )

4. Computational Integration: Calculating consumer and producer surplus¶

Say that the total inverse demand function for an economy for a certain product can be written:

$$P(Q) = \frac{150}{(ln(Q))}$$

And that the total inverse supply function for an economy can be written:

$$P(Q) = \frac{50}{(e^{1/Q})}$$

a.) Discretize and plot these two functions between Q=2 and Q=50, with n=1000. (We start at 2 here for convenience since $log(0)=\infty$ and $log(1)=0$)

b.) Find the approximate equilibrium price and quantity

c.) Write the consumer and producer surpluses as integrals. Calculate approximations of the consumer and producer surpluses (from Q=2), using both the trapezoid rule and simpsons rule.

5. Computational Integration: Integrating over a logistic diffusion model¶

When we are modelling how a new technology or product enters into a model (like smartphones, electric cars, or even a new fashion), a common function we make use of is the logistic function. We can write the logistic function as:

$$f(x) = \frac{L}{1+e^{-k(t_s-t_0)}}$$

Here $L$ represents the level of saturation where everyone who wants or can afford a product has the product. $t_0$ represents the midpoint of the diffusion process, where $t_s$ represents the general time variable. $k$ controlls the rate of diffusion.

Let's say we want to model the rate of diffusion of electric cars and we divise a logistic model to represent the sales of electric car in a certain market.

We can say that we are modelling for a period of 5000 days (about 12 years) and we assume that we have sales of about 10,000 cars at any given day We then divise the following logistic model:

In [ ]:
T = 5000
ts = np.linspace(0,T, T)
L = 10000
k = .002
t0 = ts.mean()
R = L/(1 + np.exp(-k*(ts-t0)))
In [ ]:
fig, ax = plt.subplots()
ax.plot(ts, R)
Out[ ]:
[<matplotlib.lines.Line2D at 0x7f96b851b250>]

Now we wish to find the total number of electric cars on the road at any given time. This will of course be the area underneath the diffusion curve we created below. Integrate compututationally the logistic curve and show the result on a plot. Can you think of ways to improve this model, that is make it more realistic?

What other process or mechanism could a logistic function represent?

6.) calculating the expected value of future revenue¶

Let us say that we are the CFO for a firm that sells a certain product. We wish to estimate revenues for a product for a following fiscal quarter. We have a known demand curve for our product but prices are stochastic, that is they are uncertain and represented by a probability distribution - in this case a normal distribution.

The demand curve for our product is:

$$Q(P) = 1000/ln(P)$$

Notice that restricts our $P>1$ since $ln(P)=0$. We could perhaps interpret this to mean that at P=1 or lower demand is infinite.

Then we will say that our price variable, P, is distributed normally with mean 10 and standard deviation 3:

$$P \sim N(10,3)$$

(again with the restriction that P>1)

Create a function for the revenues given a price. Then find the expected revenue by integrating over the probability distribution of prices. How does this expected value differ from just taking the mean price (10) and multiplying by the quantity that comes from putting 10 into the demand function?

(hint, you can use the norm.pdf from scipy.stats to give you the probability density function of a normal distributed stochastic variable)

In [ ]:
# import normal pdf
from scipy.stats import norm

#For example the density at 3 would be:
norm.pdf(3, loc=10, scale=3)
Out[ ]:
0.00874062969790316