Corona analytics

Covid 19 and the Energy Industry

It is probably unnecessary to say that the Covid-19 virus has disrupted essentially all facets of our lives and economies. The energy industry is no different. In this lab we will have an application towards looking at the effect on electricity consumption in Denmark and Sweden. But more generally, it is an opportunity to explore more general data on the pandemic and to get a basic understanding of the epidemiological modeling that you likely have seen or heard about.

This lab will not have an assignment..

Accessing data on the spread of coronavirus

To access daily-updated data on the coronavirus, we will use the coronavirus package that collects data from the Johns Hopkins University Center for Systems Science

We install it directly from the authors github page, and to do that we need a package called devtools

install.packages("devtools")

Usually, we only need to run the install.packages (in this case install_github) once, but here we want to reinstall every time we want updated data.

library(devtools)

## Loading required package: usethis

devtools::install_github("RamiKrispin/coronavirus")

## Skipping install of 'coronavirus' from a github remote, the SHA1 (836b5aeb) has not changed since last install.
##   Use `force = TRUE` to force installation

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(coronavirus)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Here the package coronavirus just consists of a couple of large data frames (in tibble format). The data frame coronavirus consists of the full history of confirmed corona virus cases.

head(coronavirus)

##         date province country     lat      long      type cases   uid iso2 iso3
## 1 2020-01-22  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 2 2020-01-23  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 3 2020-01-24  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 4 2020-01-25  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 5 2020-01-26  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 6 2020-01-27  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
##   code3    combined_key population continent_name continent_code
## 1   124 Alberta, Canada    4413146  North America             NA
## 2   124 Alberta, Canada    4413146  North America             NA
## 3   124 Alberta, Canada    4413146  North America             NA
## 4   124 Alberta, Canada    4413146  North America             NA
## 5   124 Alberta, Canada    4413146  North America             NA
## 6   124 Alberta, Canada    4413146  North America             NA

The data set consists of three main types of data:

unique(coronavirus$type)

## [1] "confirmed" "death"     "recovery"

Let’s start by plotting the total number of new confirmed cases and deaths by date.

totals = coronavirus %>% group_by(date, type) %>% 
  summarise(
    total_new_cases = sum(cases)
  )

## `summarise()` has grouped output by 'date'. You can override using the `.groups`
## argument.

#get rid of data error that showed negative recovered

totals = totals %>% filter(total_new_cases>=0)


totals %>% ggplot(aes(x=date, y=total_new_cases)) +
                    geom_line() +
                    facet_wrap(~type, ncol=1, scales="free_y")

We can also make and plot cumulative figures.

totals = totals %>% group_by(type) %>%  mutate(
    total_cum_cases = cumsum(total_new_cases)
  )

totals %>% ggplot(aes(x=date, y=total_cum_cases)) +
                    geom_line() +
                    facet_wrap(~type, ncol=1, scales="free_y")

It might be useful to look at them on log-10 scales

totals %>% ggplot(aes(x=date, y=total_cum_cases)) +
                    geom_line() +
                    facet_wrap(~type, ncol=1, scales="free_y") +
                  scale_y_log10()

Mapping the data

It can also be useful to look at the geographic spread of the virus by mapping the data.

library(ggmap)

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.

## Please cite ggmap if you use it! See citation("ggmap") for details.

We can look at Europe.

europeCoord= c(left = -20, bottom = 37, right = 30, top = 60)
europeMap = get_stamenmap(europeCoord, zoom = 5, maptype = "toner-lite")

## Source : http://tile.stamen.com/toner-lite/5/14/9.png

## Source : http://tile.stamen.com/toner-lite/5/15/9.png

## Source : http://tile.stamen.com/toner-lite/5/16/9.png

## Source : http://tile.stamen.com/toner-lite/5/17/9.png

## Source : http://tile.stamen.com/toner-lite/5/18/9.png

## Source : http://tile.stamen.com/toner-lite/5/14/10.png

## Source : http://tile.stamen.com/toner-lite/5/15/10.png

## Source : http://tile.stamen.com/toner-lite/5/16/10.png

## Source : http://tile.stamen.com/toner-lite/5/17/10.png

## Source : http://tile.stamen.com/toner-lite/5/18/10.png

## Source : http://tile.stamen.com/toner-lite/5/14/11.png

## Source : http://tile.stamen.com/toner-lite/5/15/11.png

## Source : http://tile.stamen.com/toner-lite/5/16/11.png

## Source : http://tile.stamen.com/toner-lite/5/17/11.png

## Source : http://tile.stamen.com/toner-lite/5/18/11.png

## Source : http://tile.stamen.com/toner-lite/5/14/12.png

## Source : http://tile.stamen.com/toner-lite/5/15/12.png

## Source : http://tile.stamen.com/toner-lite/5/16/12.png

## Source : http://tile.stamen.com/toner-lite/5/17/12.png

## Source : http://tile.stamen.com/toner-lite/5/18/12.png

Then we make our map.

We can start by filtering out data for europe using lat and long

euro_data = coronavirus %>% dplyr::filter(long>-20, long<40, lat>38, lat<70)

Then aggregating

euro_cum = euro_data %>% group_by(country, type) %>% summarise(
  cumCases = sum(cases),
  long = mean(long),
  lat =mean(lat)
)

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

euro_confirmed = euro_cum %>% filter(type=="confirmed")

ggmap(europeMap) +
geom_point(aes(x = long, y = lat, size=cumCases), color="red", data = euro_confirmed)

## Warning: Removed 6 rows containing missing values (geom_point).

The coronavirus package also comes with spatial data and a suggested way of mapping it - you can read more at the corona-package documentation (type in console: ?coronavirus_spatial). When I tried this, I was getting some error messages when it needed to install some supplementary packages. I haven’t been able to figure it out, but maybe some of you will be more successful.

Facets

For mapping multiple trends - like cumulative cases in multiple countries, facets can be an effective strategy. Here I will plot the trend line for all european countries

#First exclude provinces/states - only look at national level
euro_data = euro_data %>% filter(is.na(province))

euro_data = euro_data %>% group_by(type, country) %>% mutate(
  cumCases = cumsum(cases)
)

euro_data %>% filter(type=="confirmed", date>as.Date("2020-02-15")) %>% ggplot(aes(x=date, y=cumCases)) +
  geom_line(color="red") +
  facet_wrap(~country, ncol=5, scales="free_y") +
  theme_classic() +
   theme(axis.text.x = element_text(angle = 90))

For further analysis, one suggestion would be to link the coronavirus data with national economic data from the PENN world tables. For example, here you can find information on population, so that you can get an idea of infections per capita.

I have data on my website (updated in 2019) which can be downloaded with the following line:

penn = read_csv("https://jmaurit.github.io/makro/labs/data/pwt90.csv")

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 11848 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): countrycode;country;currency_unit;year;rgdpe;rgdpo;pop;emp;avh;hc;c...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Electricity demand and Corona-related restrictions

One of the most dramatic effects of the Corona epidemic on the energy industry is the fall in electricity consumption, primarily caused by the social restrictions put in place in order to stem the virus. Studies of countries from Italy to India have found dramatic falls in electricity consumption, often in the magnitude of 20-30% immediately following social restriction policies.

Here we take a look at consumption data from the Nordic market. We will be particularly interested in data from Denmark and southern Sweden, since their economies are closely intertwined, but also had very different responses to the Corona virus.

We need the fpp3 package for some time series work:

library(fpp3)

## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──

## ✓ tsibble     1.1.1     ✓ feasts      0.2.2
## ✓ tsibbledata 0.4.0     ✓ fable       0.3.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

Below you can download hourly data I have put together on consumption, prices and wind power for a selection of price areas in the Nordic market

wtDF = read_csv("https://jmaurit.github.io/analytics/labs/data/wtDF.csv")

## Rows: 440893 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): variable, area
## dbl  (1): value
## dttm (1): time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

unique(wtDF$variable)

## [1] "Consumption" "Wind"        "Prices"

unique(wtDF$area)

## [1] "DK1"     "DK2"     "DK"      "SE1"     "SE2"     "SE3"     "SE4"    
## [8] "SE"      "Kr.sand"

Lets look at consumption first

consDF = wtDF %>% filter(variable=="Consumption", area %in% c("DK1", "DK2", "SE4"))

consDF %>% ggplot(aes(x=time, y=value)) +
  geom_line() +
  geom_smooth() +
  geom_vline(xintercept=ymd_h("2020-03-12 00"), color="red") +
  facet_grid(area ~ ., scales="free_y") +
  ylab("Consumption")

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## Warning: Removed 54 rows containing non-finite values (stat_smooth).

## Warning: Removed 17 row(s) containing missing values (geom_path).

From this figure, we can see consumption fell following the imposition of restrictions on March 12th. However, that fall is also in line with seasonal patterns.

Lets create another chart, where we overlay the yearly time series and add a smoothed line (in this case, a cubic regression spline):

consDF["year"] = year(consDF$time)
consDF["doy"] = yday(consDF$time)

#Which day of year is march 12th
yday(as_date(ymd("2020-03-12")))

## [1] 72

consDF %>% ggplot(aes(x=doy, y=value, group=factor(year))) +
  geom_line(alpha=.1) +
  geom_smooth(span=.2, aes(color=factor(year))) +
  geom_vline(xintercept=yday(ymd("2020-03-12")), color="red") +
  facet_grid(area ~ ., scales="free_y") +
  ylab("Consumption") +
  xlab("Day of year") +
  labs(color="year")

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## Warning: Removed 54 rows containing non-finite values (stat_smooth).

## Warning: Removed 22 row(s) containing missing values (geom_path).

Here the 2020 data is the blue line, and surprisingly, there doesn’t seem to be much of an aberration compared to the normal seasonal patterns from previous years.

Compared to the dramatic falls in consumption that have been seen in other areas, just the visual presentation of the data should be enough to convince us that there was very little fall in Denmark and Southern Sweden. But we can also do some more formal, regression based tests.

We start by aggregating the data up to daily values since the hourly frequency doesn’t really aid us in any way.

#aggregate to daily data
consDF["date"] = as_date(consDF$time)
consDF_daily = consDF%>% group_by(area, date) %>% summarise(
  consumption = sum(value, na.rm=TRUE)
)

## `summarise()` has grouped output by 'area'. You can override using the `.groups`
## argument.

Then I want to define a period with the most severe societal restrictions. The start date is rather easy to pinpoint - March 12th was when restrictions were announced in Denmark. I somewhat arbitrary chose june 1st as when restrictions eased, but in reality there was a gradual easing of restrictions throughout the spring.

# test to see if consumption decreased during corona

consDF_daily = consDF_daily %>% filter(!is.na(date))
consDF_daily["corona"] =  ifelse(consDF_daily$date>as.Date("2020-03-12") & consDF_daily$date<as.Date("2020-06-01"), 1, 0)

A simple way to test for a chance in consumption while taking into account the strong yearly seasonality is to take a yearly difference. This is fairly brute force, but there is nothing inherently wrong with it:

consDF_daily = consDF_daily %>%group_by(area) %>% mutate(
  yearDiff = ((consumption - lag(consumption, 364))/lag(consumption, 364))*100
)

And then convert to tsibble format and then plot:

consTSibble =consDF_daily %>% filter(!is.na(yearDiff)) %>% as_tsibble(key=area)

## Using `date` as index variable.

consTSibble =consDF_daily %>% filter(!is.na(consumption)) %>% as_tsibble(key=area)

## Using `date` as index variable.

consTSibble %>% filter(date>=as_date("2020-01-01") & date<as_date("2020-06-01")) %>% ggplot(aes(x=date, y=yearDiff, color=factor(corona, labels =c("No", "Yes")))) +
  geom_line() +
  geom_smooth(span=.3) +
  facet_wrap(~area, ncol=1) +
  labs(color="Post restrictions") +
  xlab("") +
  ylab("Difference in el. consumption, % change yoy")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Here we can more clearly see that there was no visible fall in consumption following the lockdowns. If anything, there may have been a bit of a jump in the SE4 area.

Now we do a somewhat more sophisticated analysis using seasonal adjustment ([STL rather than differencing.

#We use the fill_gaps() tool to fill any missing data
consTSibble = consTSibble %>% fill_gaps()


#do seasonal adjustment if using level data
STL_components_daily = consTSibble %>%
  model(STL(consumption ~ trend(window=7) + season(window="periodic"),
            robust = TRUE)) %>%
  components()

STL_components_daily %>% autoplot()

consTSibble["cons_sa"] = STL_components_daily$season_adjust
consTSibble = consTSibble %>% as_tsibble(key=area)

## Using `date` as index variable.

Finally, we can run a simple ARIMA model with an exogenous dummy representing our period with societal restrictions

#use arima to model errors for differenced
mod1_cons= consTSibble %>% model(ARIMA(yearDiff~corona))
mod1_cons %>% filter(area=="DK1") %>% report()

## Series: yearDiff 
## Model: LM w/ ARIMA(5,1,0)(1,0,0)[7] errors 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5     sar1  corona
##       -0.4107  -0.4264  -0.2907  -0.2199  -0.1722  -0.2536  1.8175
## s.e.   0.0474   0.0498   0.0520   0.0502   0.0498   0.0493  3.2167
## 
## sigma^2 estimated as 21.3:  log likelihood=-2082.98
## AIC=4181.95   AICc=4182.1   BIC=4221.39

mod1_cons %>% filter(area=="DK2") %>% report()

## Series: yearDiff 
## Model: LM w/ ARIMA(4,0,0)(1,0,0)[7] errors 
## 
## Coefficients:
##          ar1      ar2     ar3     ar4     sar1  corona  intercept
##       0.6819  -0.0221  0.0558  0.1062  -0.1137  1.6402    -1.9757
## s.e.  0.0503   0.0607  0.0608  0.0504   0.0524  2.1578     0.9291
## 
## sigma^2 estimated as 13.29:  log likelihood=-1930.51
## AIC=3877.02   AICc=3877.16   BIC=3916.46

mod1_cons %>% filter(area=="SE4") %>% report()

## Series: yearDiff 
## Model: LM w/ ARIMA(1,0,0)(2,0,0)[7] errors 
## 
## Coefficients:
##          ar1    sar1    sar2  corona  intercept
##       0.7342  0.0712  0.0010  2.8264    -1.6887
## s.e.  0.0291  0.0446  0.0448  2.6135     1.0352
## 
## sigma^2 estimated as 25.52:  log likelihood=-2146.2
## AIC=4304.4   AICc=4304.48   BIC=4333.98

A simple SIRS model of contagion

In the news you may have seen models of the spread of the virus under different scenarios. These types of models are generally referred to as SIRS models.

A description of a SIR model (with economic implications) can be found in this recent working paper here, which I base my description here on.

The following code is based on this post on the website r-bloggers.

In this model you divide the population (N) into 4 categories (N is usually set to one, so you are looking at proportion of population, where the categories sum to 1).

S: Number susceptible - that is, those that have not been previously infected and have no immunity

E: The number who are exposed

I: The number who are infected

R: The number who recover or die (and are no longer susceptible)

The model is based on 5 relatively simple differential (or difference) equations which describe in a simple manner the spread of the disease.

We’ll start with the simplest relationship - the change in the number of recovered/dead is a function of the number of infected:

\(dR/dt = \gamma I\)

Here \(\gamma\) is the rate at which people either recover/die from the disease once they have it. So if it takes on average 18 days from infection to get over the disease, then \(\gamma=1/18\).

The rate of infection is determined by the number of people exposed to the disease and the rate of recovery:

\(dI/dt = \sigma E - \gamma I\)

So here \(\sigma\) is interpreted as the rate which people who are exposed to the disease become infected. This is the inverse of the incubation period–how long from exposure to infection. So if that is 5.2 days, then \(\sigma=1/5.2\).

The rate of recovery/death counts negatively towards the total rate of infection

The rate of exposure is in turn a function of the proportion of the population who are susceptible and the number who are infected:

\(dE/dt = \beta_t \frac{S}{N}I - \sigma E\)

\(\beta_t\) is probably the most important parameter in terms of understanding the spread of the disease and the public-health measures put in place. This parameter represents how easily people “bump-in” to one another, and spread the virus. By putting in “social-distancing” measures, the government is trying to reduce the amoung of “bumping in”. We’ll see the effect of this in our simulation below.

We also have a rate \(R_t\) which is the ratio of this “bump-in” ratio to the recovery/death rate, \(\gamma\). This ratio has gotten a lot of attention when the media discusses these models.

\(\rho_t = \frac{\beta_t}{\gamma}\)

(I spent a lot of time being confused about this, because in the paper I referenced, they write \(\rho_t\) as \(R_t\), which is completly distinct from the number of people who are recovered, R. This seems to be a gross violation of good notation.)

Finally, the rate of change of those who are susceptible is negatively proportional to the bump-in rate, the existing proportion who already are susceptible and the proportion who are infected already. \(dS/dt = - \beta_t \frac{S}{N}I\)

#Modelling in R

We can program in a simplified simulation of this model in R. Below we basically ignore the nuance between exposure and infection, ignoring the dynamics that come in the period between being exposed to the infection and showing symptoms. We also fix \(\beta_t\)

The way basically all simulations of continuous models (like the one presented above) work is through discretisation. This basically means that you split the model into tiny time periods, and then approximate the dynamics by updating the model through each of the time periods.

In writing the script, all we need to do is to rewrite the continuous-time formulations above into discrete time. So, for example:

\(dI/dt = \sigma E - \gamma I\)

becomes

\(I_t - I_{t-1} = \sigma E_t - \gamma I_t\)

\(I_t = I_{t-1} + \sigma E_t - \gamma I_t\)

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

First, we create our main function, called SIR. As input, we enter the population N, and the initial number of susceptible (S0), infected (I0), our bump-in parameter (beta), and our recovery rate (gamma), and the number of periods t.

So, walking through, we first initialise our vectors of S and I and set the initial conditions

Then we begin a loop through the time periods.

For each period (following 1) the number susceptible is equal to the previous periods susceptible minus those that have newly become infected.
The number infected is equal to the previous number plus our rate of new infections, less those that have recovered
The model exits early if the the number of infected or susceptible goes below 1 (basically, the case where the virus is eliminated from the population, either because everyone has already recovered or died, or because it has been completly eliminated through public health measures.)
We collect the series into a dataframe and then limit the values to those where S and I are above 0.
We add together the number infected and recovered/died at all time periods - these are those that are no longer susceptible

SIR = function(N, S0, I0, beta, gamma, t) {
    S = numeric(t)
    I = numeric(t)
    S[1] = S0
    I[1] = I0
 
    for (i in 2:t) {
        S[i] = S[i-1] - beta*S[i-1]/N*I[i-1] #1
        I[i] = I[i-1] + beta*S[i-1]/N*I[i-1] - gamma * I[i-1] #2
        if (I[i] < 1 || S[i] < 1) #3
            break
    }
 
  df = data.frame(time=1:t, S=S, I=I, R=N-S-I)
  df = df[S>1&I>1,] #4

  df$AR = (df$I+df$R)/N #5
  nr = nrow(df)
  rate = df$I[2:nr]/df$I[1:(nr-1)] #6
  
  df$rate = c(rate[1], rate)
  return(df)
}

Then our plotting function:

plotSIR <- function(df) {
    nr <- nrow(df)
    p1 <- ggplot(df, aes(time, I))+geom_line()
 
    p1 <- p1+ggtitle("No. of Infections")
    p1 <- p1+geom_point(x=df[nr, "time"], y=df[nr, "I"], color="red", size=3)
    p2 <- ggplot(df, aes(time, AR))+geom_line()+xlab("")
 
    p2 <- p2 + ggtitle("Attack rate increases")
    p2 <- p2 + geom_point(x=df[nr, "time"], y=df[nr, "AR"], color="red", size=3)
    p3 <- ggplot(df, aes(time, S))+geom_line()
    p3 <- p3 +ggtitle("Depletion of susceptibles")
    p3 <- p3+ylim(0, max(df$S)) +
        geom_point(x=df[nr, "time"], y=df[nr, "S"], color="red", size=3)
 
    p4 <- ggplot(df, aes(time, rate))+geom_line()
    p4 <- p4 + ggtitle("R")+ ylim(range(df$rate)+c(-.2, .2)) + 
        geom_hline(yintercept=1, color="steelblue", linetype="dashed")
    p4 <- p4+geom_point(x=df[nr, "time"], y=df[nr, "rate"], color="red", size=3)
 
    grid.arrange(p1,p2,p3,p4, ncol=1)
    invisible(list(p1=p1, p2=p2, p3=p3, p4=p4))
}

Finally, we can run our entire model with the given inputs:

Try playing with different values of beta, representing different social-distancing measures.

df <- SIR(1e6, 1e6-1, 1, beta=0.2, gamma=1/18, 300)
plotSIR(df)

The EpiModel package

The above model is extremely simple. The models that epidemeologists use are obviously going to be much more refined. To get a bit better sense of these models, we can use the R package EpiModel.

The following is based on a post on Tim Churches Health Data Science Blog

install.packages("EpiModel") 
#type no when asked if you want to compile from binary version

library("EpiModel")

The thing to notice about the simple model that we started with, was that it was completly deterministic. That is, once we put in our parameters, it will produce exactly the same outputs each time it runs.

In reality, we know that there is lots of uncertainty involved. Not just uncertainty in the modelling of the actual contagion, but the way the contagion spreads is itself uncertain. Whether one person infects 1 or 2 or 3 has a lot to do with random factors–that they sneeze just as they are walking by a group of people.

The model we created above was also limited to assuming we have a single type of human, all of whom have identical risk factors. This is of course far from realistic.

So modelling uncertainty–what is called stochastics–becomes important, as does the ability to create many different “compartments”, that is modelling different effects on different people. This is what the models in the package “EpiModel” do.

First we specify a model type:

control = control.icm(type="SIR", nsteps=100, nsims=10)

Here we initialize a control model of time SIR (susceptible-infected-recovered, another option includes susceptible-infected-susceptible, for example, if you do not gain immunity after exposure.)

We then initialize a starting point for our model using the init.icm command:

init=init.icm(s.num=997, i.num=3, r.num=0)

Here with 3 infected people and 997 susceptible.

Then we model the transitions - that is specifying the parameters which indicate the probabilities of an individual to being infected.

param = param.icm(inf.prob=0.05, 
                  act.rate=10, 
                  rec.rate=1/20, 
                  a.rate=(10.5/365)/1000, 
                  ds.rate=(7/365)/1000,
                  di.rate=(14/365)/1000,
                  dr.rate=(7/365)/1000)

Here is an overview of the parameters

inf.prob This is the probability of infection at each exposure. So 1/20 exposures will lead to an infection
act.rate This is the rate that people are exposed to potential infections, here set to 10 per day.
rec.rate This is the recovery rate, so this indicates that it will take on average 20 days for an infected person to recover (or die).
a.rate Stands for arrival rate, it indicates how many new individuals enter a “compartment” (through births or immigration). Here we indicate that we add 10.5 people every year to our initial population of 1000.
ds.rate, di.rate, dr.rate Stands for departures (deaths and emmigration) from each of the categories (s, i, or r).

Then we can run our simulation:

sim = icm(param, init, control)

## Default modules set to appropriate two-group functions.
##               See documentation for details.

We can output a summary of our model by just writing in our object:

sim

## EpiModel Simulation
## =======================
## Model class: icm
## 
## Simulation Summary
## -----------------------
## Model type: SIR
## No. simulations: 10
## No. time steps: 100
## No. groups: 1
## 
## Model Parameters
## -----------------------
## inf.prob = 0.05
## act.rate = 10
## rec.rate = 0.05
## a.rate = 2.876712e-05
## ds.rate = 1.917808e-05
## di.rate = 3.835616e-05
## dr.rate = 1.917808e-05
## 
## Model Output
## -----------------------
## Variables: s.num i.num num r.num si.flow ir.flow ds.flow 
## di.flow dr.flow a.flow

More useful, the model includes a plotting tool to automatically plot the simulation:

plot(sim)

Now you can play with some of the parameters in the model and see what effects it has on the results.

In terms of modelling the effects of social-distancing, the act.rate is perhaps the most important. By imposing social-distancing, you can drop the number of exposures any individual encounters.

Try, for example, dropping the exposures from 10 to 5–the results are quite dramatic. With an act rate at 10, 60 percent of the population was infected after 30 days. At 5, only about 10% are infected after 30 days. Though, if this were the actual rate of infection, it would still be extremely dramatic.

The above plots the cumulative number of individuals in each category at each period. We also have the ability to plot the flows - like the number of new cases per day:

plot(sim, y="si.flow")