It is probably unnecessary to say that the Covid-19 virus has disrupted essentially all facets of our lives and economies. The energy industry is no different. In this lab we will have an application towards looking at the effect on electricity consumption in Denmark and Sweden. But more generally, it is an opportunity to explore more general data on the pandemic and to get a basic understanding of the epidemiological modeling that you likely have seen or heard about.
This lab will not have an assignment..
It can also be useful to look at the geographic spread of the virus by mapping the data.
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
We can look at Europe.
europeCoord= c(left = -20, bottom = 37, right = 30, top = 60)
europeMap = get_stamenmap(europeCoord, zoom = 5, maptype = "toner-lite")
## Source : http://tile.stamen.com/toner-lite/5/14/9.png
## Source : http://tile.stamen.com/toner-lite/5/15/9.png
## Source : http://tile.stamen.com/toner-lite/5/16/9.png
## Source : http://tile.stamen.com/toner-lite/5/17/9.png
## Source : http://tile.stamen.com/toner-lite/5/18/9.png
## Source : http://tile.stamen.com/toner-lite/5/14/10.png
## Source : http://tile.stamen.com/toner-lite/5/15/10.png
## Source : http://tile.stamen.com/toner-lite/5/16/10.png
## Source : http://tile.stamen.com/toner-lite/5/17/10.png
## Source : http://tile.stamen.com/toner-lite/5/18/10.png
## Source : http://tile.stamen.com/toner-lite/5/14/11.png
## Source : http://tile.stamen.com/toner-lite/5/15/11.png
## Source : http://tile.stamen.com/toner-lite/5/16/11.png
## Source : http://tile.stamen.com/toner-lite/5/17/11.png
## Source : http://tile.stamen.com/toner-lite/5/18/11.png
## Source : http://tile.stamen.com/toner-lite/5/14/12.png
## Source : http://tile.stamen.com/toner-lite/5/15/12.png
## Source : http://tile.stamen.com/toner-lite/5/16/12.png
## Source : http://tile.stamen.com/toner-lite/5/17/12.png
## Source : http://tile.stamen.com/toner-lite/5/18/12.png
Then we make our map.
We can start by filtering out data for europe using lat and long
euro_data = coronavirus %>% dplyr::filter(long>-20, long<40, lat>38, lat<70)
Then aggregating
euro_cum = euro_data %>% group_by(country, type) %>% summarise(
cumCases = sum(cases),
long = mean(long),
lat =mean(lat)
)
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
euro_confirmed = euro_cum %>% filter(type=="confirmed")
ggmap(europeMap) +
geom_point(aes(x = long, y = lat, size=cumCases), color="red", data = euro_confirmed)
## Warning: Removed 6 rows containing missing values (geom_point).
The coronavirus package also comes with spatial data and a suggested way of mapping it - you can read more at the corona-package documentation (type in console: ?coronavirus_spatial). When I tried this, I was getting some error messages when it needed to install some supplementary packages. I haven’t been able to figure it out, but maybe some of you will be more successful.
For mapping multiple trends - like cumulative cases in multiple countries, facets can be an effective strategy. Here I will plot the trend line for all european countries
#First exclude provinces/states - only look at national level
euro_data = euro_data %>% filter(is.na(province))
euro_data = euro_data %>% group_by(type, country) %>% mutate(
cumCases = cumsum(cases)
)
euro_data %>% filter(type=="confirmed", date>as.Date("2020-02-15")) %>% ggplot(aes(x=date, y=cumCases)) +
geom_line(color="red") +
facet_wrap(~country, ncol=5, scales="free_y") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90))
For further analysis, one suggestion would be to link the coronavirus data with national economic data from the PENN world tables. For example, here you can find information on population, so that you can get an idea of infections per capita.
I have data on my website (updated in 2019) which can be downloaded with the following line:
penn = read_csv("https://jmaurit.github.io/makro/labs/data/pwt90.csv")
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 11848 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): countrycode;country;currency_unit;year;rgdpe;rgdpo;pop;emp;avh;hc;c...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In the news you may have seen models of the spread of the virus under different scenarios. These types of models are generally referred to as SIRS models.
A description of a SIR model (with economic implications) can be found in this recent working paper here, which I base my description here on.
The following code is based on this post on the website r-bloggers.
In this model you divide the population (N) into 4 categories (N is usually set to one, so you are looking at proportion of population, where the categories sum to 1).
S: Number susceptible - that is, those that have not been previously infected and have no immunity
E: The number who are exposed
I: The number who are infected
R: The number who recover or die (and are no longer susceptible)
The model is based on 5 relatively simple differential (or difference) equations which describe in a simple manner the spread of the disease.
We’ll start with the simplest relationship - the change in the number of recovered/dead is a function of the number of infected:
\(dR/dt = \gamma I\)
Here \(\gamma\) is the rate at which people either recover/die from the disease once they have it. So if it takes on average 18 days from infection to get over the disease, then \(\gamma=1/18\).
The rate of infection is determined by the number of people exposed to the disease and the rate of recovery:
\(dI/dt = \sigma E - \gamma I\)
So here \(\sigma\) is interpreted as the rate which people who are exposed to the disease become infected. This is the inverse of the incubation period–how long from exposure to infection. So if that is 5.2 days, then \(\sigma=1/5.2\).
The rate of recovery/death counts negatively towards the total rate of infection
The rate of exposure is in turn a function of the proportion of the population who are susceptible and the number who are infected:
\(dE/dt = \beta_t \frac{S}{N}I - \sigma E\)
\(\beta_t\) is probably the most important parameter in terms of understanding the spread of the disease and the public-health measures put in place. This parameter represents how easily people “bump-in” to one another, and spread the virus. By putting in “social-distancing” measures, the government is trying to reduce the amoung of “bumping in”. We’ll see the effect of this in our simulation below.
We also have a rate \(R_t\) which is the ratio of this “bump-in” ratio to the recovery/death rate, \(\gamma\). This ratio has gotten a lot of attention when the media discusses these models.
\(\rho_t = \frac{\beta_t}{\gamma}\)
(I spent a lot of time being confused about this, because in the paper I referenced, they write \(\rho_t\) as \(R_t\), which is completly distinct from the number of people who are recovered, R. This seems to be a gross violation of good notation.)
Finally, the rate of change of those who are susceptible is negatively proportional to the bump-in rate, the existing proportion who already are susceptible and the proportion who are infected already. \(dS/dt = - \beta_t \frac{S}{N}I\)
#Modelling in R
We can program in a simplified simulation of this model in R. Below we basically ignore the nuance between exposure and infection, ignoring the dynamics that come in the period between being exposed to the infection and showing symptoms. We also fix \(\beta_t\)
The way basically all simulations of continuous models (like the one presented above) work is through discretisation. This basically means that you split the model into tiny time periods, and then approximate the dynamics by updating the model through each of the time periods.
In writing the script, all we need to do is to rewrite the continuous-time formulations above into discrete time. So, for example:
\(dI/dt = \sigma E - \gamma I\)
becomes
\(I_t - I_{t-1} = \sigma E_t - \gamma I_t\)
\(I_t = I_{t-1} + \sigma E_t - \gamma I_t\)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
First, we create our main function, called SIR. As input, we enter the population N, and the initial number of susceptible (S0), infected (I0), our bump-in parameter (beta), and our recovery rate (gamma), and the number of periods t.
So, walking through, we first initialise our vectors of S and I and set the initial conditions
Then we begin a loop through the time periods.
For each period (following 1) the number susceptible is equal to the previous periods susceptible minus those that have newly become infected.
The number infected is equal to the previous number plus our rate of new infections, less those that have recovered
The model exits early if the the number of infected or susceptible goes below 1 (basically, the case where the virus is eliminated from the population, either because everyone has already recovered or died, or because it has been completly eliminated through public health measures.)
We collect the series into a dataframe and then limit the values to those where S and I are above 0.
We add together the number infected and recovered/died at all time periods - these are those that are no longer susceptible
SIR = function(N, S0, I0, beta, gamma, t) {
S = numeric(t)
I = numeric(t)
S[1] = S0
I[1] = I0
for (i in 2:t) {
S[i] = S[i-1] - beta*S[i-1]/N*I[i-1] #1
I[i] = I[i-1] + beta*S[i-1]/N*I[i-1] - gamma * I[i-1] #2
if (I[i] < 1 || S[i] < 1) #3
break
}
df = data.frame(time=1:t, S=S, I=I, R=N-S-I)
df = df[S>1&I>1,] #4
df$AR = (df$I+df$R)/N #5
nr = nrow(df)
rate = df$I[2:nr]/df$I[1:(nr-1)] #6
df$rate = c(rate[1], rate)
return(df)
}
Then our plotting function:
plotSIR <- function(df) {
nr <- nrow(df)
p1 <- ggplot(df, aes(time, I))+geom_line()
p1 <- p1+ggtitle("No. of Infections")
p1 <- p1+geom_point(x=df[nr, "time"], y=df[nr, "I"], color="red", size=3)
p2 <- ggplot(df, aes(time, AR))+geom_line()+xlab("")
p2 <- p2 + ggtitle("Attack rate increases")
p2 <- p2 + geom_point(x=df[nr, "time"], y=df[nr, "AR"], color="red", size=3)
p3 <- ggplot(df, aes(time, S))+geom_line()
p3 <- p3 +ggtitle("Depletion of susceptibles")
p3 <- p3+ylim(0, max(df$S)) +
geom_point(x=df[nr, "time"], y=df[nr, "S"], color="red", size=3)
p4 <- ggplot(df, aes(time, rate))+geom_line()
p4 <- p4 + ggtitle("R")+ ylim(range(df$rate)+c(-.2, .2)) +
geom_hline(yintercept=1, color="steelblue", linetype="dashed")
p4 <- p4+geom_point(x=df[nr, "time"], y=df[nr, "rate"], color="red", size=3)
grid.arrange(p1,p2,p3,p4, ncol=1)
invisible(list(p1=p1, p2=p2, p3=p3, p4=p4))
}
Finally, we can run our entire model with the given inputs:
Try playing with different values of beta, representing different social-distancing measures.
df <- SIR(1e6, 1e6-1, 1, beta=0.2, gamma=1/18, 300)
plotSIR(df)
The above model is extremely simple. The models that epidemeologists use are obviously going to be much more refined. To get a bit better sense of these models, we can use the R package EpiModel.
The following is based on a post on Tim Churches Health Data Science Blog
install.packages("EpiModel")
#type no when asked if you want to compile from binary version
library("EpiModel")
The thing to notice about the simple model that we started with, was that it was completly deterministic. That is, once we put in our parameters, it will produce exactly the same outputs each time it runs.
In reality, we know that there is lots of uncertainty involved. Not just uncertainty in the modelling of the actual contagion, but the way the contagion spreads is itself uncertain. Whether one person infects 1 or 2 or 3 has a lot to do with random factors–that they sneeze just as they are walking by a group of people.
The model we created above was also limited to assuming we have a single type of human, all of whom have identical risk factors. This is of course far from realistic.
So modelling uncertainty–what is called stochastics–becomes important, as does the ability to create many different “compartments”, that is modelling different effects on different people. This is what the models in the package “EpiModel” do.
First we specify a model type:
control = control.icm(type="SIR", nsteps=100, nsims=10)
Here we initialize a control model of time SIR (susceptible-infected-recovered, another option includes susceptible-infected-susceptible, for example, if you do not gain immunity after exposure.)
We then initialize a starting point for our model using the init.icm command:
init=init.icm(s.num=997, i.num=3, r.num=0)
Here with 3 infected people and 997 susceptible.
Then we model the transitions - that is specifying the parameters which indicate the probabilities of an individual to being infected.
param = param.icm(inf.prob=0.05,
act.rate=10,
rec.rate=1/20,
a.rate=(10.5/365)/1000,
ds.rate=(7/365)/1000,
di.rate=(14/365)/1000,
dr.rate=(7/365)/1000)
Here is an overview of the parameters
inf.prob This is the probability of infection at each exposure. So 1/20 exposures will lead to an infection
act.rate This is the rate that people are exposed to potential infections, here set to 10 per day.
rec.rate This is the recovery rate, so this indicates that it will take on average 20 days for an infected person to recover (or die).
a.rate Stands for arrival rate, it indicates how many new individuals enter a “compartment” (through births or immigration). Here we indicate that we add 10.5 people every year to our initial population of 1000.
ds.rate, di.rate, dr.rate Stands for departures (deaths and emmigration) from each of the categories (s, i, or r).
Then we can run our simulation:
sim = icm(param, init, control)
## Default modules set to appropriate two-group functions.
## See documentation for details.
We can output a summary of our model by just writing in our object:
sim
## EpiModel Simulation
## =======================
## Model class: icm
##
## Simulation Summary
## -----------------------
## Model type: SIR
## No. simulations: 10
## No. time steps: 100
## No. groups: 1
##
## Model Parameters
## -----------------------
## inf.prob = 0.05
## act.rate = 10
## rec.rate = 0.05
## a.rate = 2.876712e-05
## ds.rate = 1.917808e-05
## di.rate = 3.835616e-05
## dr.rate = 1.917808e-05
##
## Model Output
## -----------------------
## Variables: s.num i.num num r.num si.flow ir.flow ds.flow
## di.flow dr.flow a.flow
More useful, the model includes a plotting tool to automatically plot the simulation:
plot(sim)
Now you can play with some of the parameters in the model and see what effects it has on the results.
In terms of modelling the effects of social-distancing, the act.rate is perhaps the most important. By imposing social-distancing, you can drop the number of exposures any individual encounters.
Try, for example, dropping the exposures from 10 to 5–the results are quite dramatic. With an act rate at 10, 60 percent of the population was infected after 30 days. At 5, only about 10% are infected after 30 days. Though, if this were the actual rate of infection, it would still be extremely dramatic.
The above plots the cumulative number of individuals in each category at each period. We also have the ability to plot the flows - like the number of new cases per day:
plot(sim, y="si.flow")