Background reading

An excellent book in general on regression, especially from a Bayesian view point is Regression and Other Stories (ROS) by Gelman, Hill and Avehtari. You can download a copy of the pdf from the link provided for personal use.

For information on the Poisson regression model, see Ch. 15 in ROS.
For information on Bayesian analysis, posterior prediction and decision analysis, see Ch. 9 in ROS.

Introduction to lab 9

In this final lab we continue working with non-linear models. We start by taking up the subject we briefly looked at in the first labs: exploratory petroleum drilling. But now we go deeper into an analysis, applying a Poisson regression model. This is a non-linear regression model of the Generalized Linear Model type, where we apply a non-linear transformation to a continuous linear predictor for a discrete outcome (the number of drilling events per licence-year). Here we can also introduce a non-parametric explanatory function, like we saw in lab 8.

Topically, the question we will explore is how a competitiveness reform in the mid-2000s changed the dynamics of exploratory well drilling.This part of the lab is based on a working paper with the title Drill! Maybe? Drill.

We switch tracks in the second part of the lab and consider non-linear relationships between the probabilistic predictions from our model to some real-life decision we are interested in. This is a field called decision analysis, and it is an extension of a Bayesian interpretation of regression models that we will briefly touch on. Here we will make use of a simplified example about deciding whether to invest in a large wind farm.

The decision to drill

In labs 2 and 3 we looked at company-level data on drilling decisions on the Norwegian Continental Shelf. We combined data about which companies had drilling licenses to financial data of those companies. We discussed how the drilling decision, when aggregated, is an important element in understanding the dynamics of oil production. Exploratory drilling is perhaps the most important decision in establishing future oil production. It is also the most risky and price-sensitive investment decision.

In this lab we will continue our analysis of exploratory oil drilling, but crucially, we will look at the problem from another perspective. Instead of aggregating to the company level, we will look at the problem from a licence-year level. We will ask the question, for each licence for a given year, what can explain why a licence-holder decided to drill or not.

Putting together our data set

A big part of our analysis, as usual, is putting the data in the correct form. In the interest of time, I provide a ready-cleaned and formatted data set. The data set is organised so that the unit of observation is licence-years. So one row might correspond to licence 001 in the year 2000 for a certain company x, where we would have a column indicating whether and how many drilling events happened in that period.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(mgcv)

## Loading required package: nlme
## 
## Attaching package: 'nlme'
## 
## The following object is masked from 'package:dplyr':
## 
##     collapse
## 
## This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.

library(knitr)
library(texreg)

## Version:  1.38.6
## Date:     2022-04-06
## Author:   Philip Leifeld (University of Essex)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
## 
## Attaching package: 'texreg'
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

glm_df = read_csv("http://jmaurit.github.io/analytics/labs/data/glm_df.csv")

## Rows: 12603 Columns: 38
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): LongName
## dbl  (35): year, NpdidLicence, orgnr, nDrills, nDry, revenues, oper_revenues...
## date  (2): ValidFrom, ValidTo
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#glm_df = read_csv("data/glm_df.csv")

We would like to add measures of the oil price. Since we are looking at a relatively long time period, we should also adjust the oil price to inflation, so we also import data from GDP deflator. We are most interested in Brent crude price, but this only goes back to 1988, so we also import the WTI price. All these macro datasets are from the Federal Reserve FRED database:

#merge with oil
wti_url = "https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=WTISPLC&scale=left&cosd=1946-01-01&coed=2022-09-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2022-11-01&revision_date=2022-11-01&nd=1946-01-01"

brent_url = "https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=off&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=POILBREUSDM&scale=left&cosd=1990-01-01&coed=2022-08-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2022-11-01&revision_date=2022-11-01&nd=1990-01-01"
#Gross domestic product (implicit price deflator)

deflator_url = "https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=A191RD3A086NBEA&scale=left&cosd=1929-01-01&coed=2021-01-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Annual&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2022-11-01&revision_date=2022-11-01&nd=1929-01-01"

wti = read_csv(wti_url)

## Rows: 921 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): WTISPLC
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

brent = read_csv(brent_url)

## Rows: 392 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): POILBREUSDM
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

deflator = read_csv(deflator_url)

## Rows: 93 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): A191RD3A086NBEA
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

deflator["year"] =  format(deflator$DATE,"%Y")
colnames(deflator)[2] = "deflator"
deflator = deflator[c("deflator", "year")]

colnames(wti)[1] = "date"
colnames(brent)[1] = "date"

Then we join and aggregate up to yearly values

oilDF = wti %>% left_join(brent, by="date")

oilDF["year"] = format(oilDF$date,"%Y")

oilYearDF = oilDF %>% group_by(year) %>% summarize(
  brent = mean(POILBREUSDM, na.rm=TRUE),
  wti = mean(WTISPLC, na.rm=TRUE),
  brent_sd = sd(POILBREUSDM, na.rm=TRUE),
  wti_sd = sd(WTISPLC, na.rm=TRUE)
)

Then merge with the gdp deflator variable and then create the real oil price measure.

oilYearDF = left_join(oilYearDF, deflator, by="year")
oilYearDF["oilC"] = ifelse(is.na(oilYearDF$brent), oilYearDF$wti, oilYearDF$brent)
oilYearDF["oilReal"] = oilYearDF$oilC/(oilYearDF$deflator/100)

Now we’ll combine with our data set of drilling

#get rid of data with missing orgnr

glm_df = glm_df %>% filter(!is.na(orgnr))

#create lag variable as well:
#postive and negative values of our real oil price

oilYearDF = oilYearDF %>% mutate(
  oilReal_l1 = lag(oilReal, 1),
  oilReal_l2 = lag(oilReal, 2),
  d_oilReal = c(NA, diff(log(oilReal))),
  d_oilReal_l1 = lag(d_oilReal, 1),
  d_oilReal_l2 = lag(d_oilReal, 2),
)

oilYearDF["year"] = as.numeric(oilYearDF$year)

glm_df = glm_df %>% left_join(oilYearDF %>% select(year, oilReal:d_oilReal_l2), by="year")

Now we create some timing variables for the licencees. The idea is that when we are modelling drilling, then drilling in relation to the licence dates will be an important predictor.

#create time from start of licence
#create time from new licence operator
#indicator of number of companies on shelf
#time since last drill in licence?

glm_df = glm_df %>% group_by(NpdidLicence) %>% mutate(
  licenceStart = min(ValidFrom),
  licenceYears = year - year(licenceStart),
  licenceOwnerYears = year - year(ValidFrom)
)

Finally, we’ll create a dummy variable indicating whether the data is before or after 2005, when pro-competition reforms were introduced on the Norwegian Continental Shelf.

glm_df = glm_df %>% mutate(
  d2005 = ifelse(year>2004, 1, 0)
)

A few descriptives

Before we jump into the modelling, we’ll do some exploration of the data with some visualizations.

First let’s aggregate the number of drilling events to each company each year. We’ll start by looking at the number of firms on the shelf each year.

nLicByComp_df = glm_df %>% group_by(year, orgnr) %>% summarize(
  nLicence = n()
)

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

nLicByComp_df %>% filter(year<2020) %>% ggplot(aes(x=year)) +
    geom_bar() + 
    xlab("") +
    ylab("Number of Companies on NCS")

    theme_light()

## List of 93
##  $ line                      :List of 6
##   ..$ colour       : chr "black"
##   ..$ size         : num 0.5
##   ..$ linetype     : num 1
##   ..$ lineend      : chr "butt"
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ rect                      :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : chr "black"
##   ..$ size         : num 0.5
##   ..$ linetype     : num 1
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ text                      :List of 11
##   ..$ family       : chr ""
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "black"
##   ..$ size         : num 11
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 0
##   ..$ lineheight   : num 0.9
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ title                     : NULL
##  $ aspect.ratio              : NULL
##  $ axis.title                : NULL
##  $ axis.title.x              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.top          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.bottom       : NULL
##  $ axis.title.y              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y.left         : NULL
##  $ axis.title.y.right        :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text                 :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "grey30"
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.top           :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.bottom        : NULL
##  $ axis.text.y               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y.left          : NULL
##  $ axis.text.y.right         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.ticks                :List of 6
##   ..$ colour       : chr "grey70"
##   ..$ size         : 'rel' num 0.5
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ axis.ticks.x              : NULL
##  $ axis.ticks.x.top          : NULL
##  $ axis.ticks.x.bottom       : NULL
##  $ axis.ticks.y              : NULL
##  $ axis.ticks.y.left         : NULL
##  $ axis.ticks.y.right        : NULL
##  $ axis.ticks.length         : 'simpleUnit' num 2.75points
##   ..- attr(*, "unit")= int 8
##  $ axis.ticks.length.x       : NULL
##  $ axis.ticks.length.x.top   : NULL
##  $ axis.ticks.length.x.bottom: NULL
##  $ axis.ticks.length.y       : NULL
##  $ axis.ticks.length.y.left  : NULL
##  $ axis.ticks.length.y.right : NULL
##  $ axis.line                 : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.line.x               : NULL
##  $ axis.line.x.top           : NULL
##  $ axis.line.x.bottom        : NULL
##  $ axis.line.y               : NULL
##  $ axis.line.y.left          : NULL
##  $ axis.line.y.right         : NULL
##  $ legend.background         :List of 5
##   ..$ fill         : NULL
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.margin             : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
##   ..- attr(*, "unit")= int 8
##  $ legend.spacing            : 'simpleUnit' num 11points
##   ..- attr(*, "unit")= int 8
##  $ legend.spacing.x          : NULL
##  $ legend.spacing.y          : NULL
##  $ legend.key                :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.key.size           : 'simpleUnit' num 1.2lines
##   ..- attr(*, "unit")= int 3
##  $ legend.key.height         : NULL
##  $ legend.key.width          : NULL
##  $ legend.text               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.align         : NULL
##  $ legend.title              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.align        : NULL
##  $ legend.position           : chr "right"
##  $ legend.direction          : NULL
##  $ legend.justification      : chr "center"
##  $ legend.box                : NULL
##  $ legend.box.just           : NULL
##  $ legend.box.margin         : 'margin' num [1:4] 0cm 0cm 0cm 0cm
##   ..- attr(*, "unit")= int 1
##  $ legend.box.background     : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.box.spacing        : 'simpleUnit' num 11points
##   ..- attr(*, "unit")= int 8
##  $ panel.background          :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.border              :List of 5
##   ..$ fill         : logi NA
##   ..$ colour       : chr "grey70"
##   ..$ size         : 'rel' num 1
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.spacing             : 'simpleUnit' num 5.5points
##   ..- attr(*, "unit")= int 8
##  $ panel.spacing.x           : NULL
##  $ panel.spacing.y           : NULL
##  $ panel.grid                :List of 6
##   ..$ colour       : chr "grey87"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.major          :List of 6
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.5
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.minor          :List of 6
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.25
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.major.x        : NULL
##  $ panel.grid.major.y        : NULL
##  $ panel.grid.minor.x        : NULL
##  $ panel.grid.minor.y        : NULL
##  $ panel.ontop               : logi FALSE
##  $ plot.background           :List of 5
##   ..$ fill         : NULL
##   ..$ colour       : chr "white"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ plot.title                :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 1.2
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.title.position       : chr "panel"
##  $ plot.subtitle             :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption.position     : chr "panel"
##  $ plot.tag                  :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 1.2
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.tag.position         : chr "topleft"
##  $ plot.margin               : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
##   ..- attr(*, "unit")= int 8
##  $ strip.background          :List of 5
##   ..$ fill         : chr "grey70"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ strip.background.x        : NULL
##  $ strip.background.y        : NULL
##  $ strip.placement           : chr "inside"
##  $ strip.text                :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "white"
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 4.4points 4.4points 4.4points 4.4points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.x              : NULL
##  $ strip.text.y              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.switch.pad.grid     : 'simpleUnit' num 2.75points
##   ..- attr(*, "unit")= int 8
##  $ strip.switch.pad.wrap     : 'simpleUnit' num 2.75points
##   ..- attr(*, "unit")= int 8
##  $ strip.text.y.left         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE

We notice that number of firms increases after the reforms are implimented.

Now we create a comparison between the oil price and the amount of drilling. We start by aggregating and counting the number of drills per year then combining this with our data on oil prices. We then reshape the data set so that it is in long format. Then we are ready to plot using ggplot:

#compare number of drills and oil price

totalDrills = glm_df %>% group_by(year) %>% summarize(
  totalDrills = sum(nDrills, na.rm=TRUE)
)

#join our yearly drilling data with oil price data
oilYearDF = oilYearDF %>% left_join(totalDrills, by="year")


oilDrillsDF = oilYearDF %>% select(year, oilReal, totalDrills) %>% pivot_longer(cols=c("oilReal", "totalDrills"))

#Change the name of the categories in the "name" variable
oilDrillsDF$name = factor(oilDrillsDF$name, levels=c("oilReal", "totalDrills"), labels=c("Real Oil Price, 2012 dollars", "Drills per year"))

 oilDrillsDF %>% ggplot(aes(x=year, y=value)) + 
  geom_line() + 
  facet_wrap(~name, ncol=1, scales = "free") + 
  xlim(1970, 2019) + 
  ylab("") +
  theme_light()

## Warning: Removed 27 row(s) containing missing values (geom_path).

Poisson modelling

Now we’ll start the formal modelling. We will be using a Poisson regression model–a form of Generalized Linear Modeling–which we use when modelling discrete rates. In this case, the rate we are modelling is the number of drilling events per licence-year.

We can write our model mathematically below:

\[\begin{align} y_{i,t} & \sim Poisson(\theta_{i,t} \mid I^{2005})\\ \theta_{i,t} \mid I^{2005} & = exp(X_i \beta, \omega)\\ X & = \{t-T_i^I, t-T_i^O,price_t, price_{t-1},...,year_t, year_t^2\} \end{align}\]

The first line indicates that we are modelling our \(y_{i,t}\) data as a poisson model, with a single parameter \(\theta_{i,t}\). the \(I^{2005}\) indicator is meant to show that we have separate models for pre- and post data. In the following line, we show that that the \(\theta_{i,t}\) parameter is modeled as an exponential function of linear predictors (the exponent is called the “link” function in GLM-talk), with explanatory variables in a vector \(X_i\). Usually, a poisson regression has only a single parameter, \(\theta\) that represents both mean and variance in the distribution. However, that can be restricting, especially in cases where there are a lot of 0’s in the real data. Luckily, we can adopt this by adding a parameter, \(\omega\) for extra variance in the term - this is called “over-dispersion”.

Finally, the last line indicates the explanatory variables we include. This includes two variables that represent the timing of drilling relative to licencing. The term \(t-T_i^I\) represents the time since the licence was initially granted. The term \(t-T_i^O\) represents the time since the current owner took over the licence. \(price_t\) and \(price_{t-1}\) represents the oil price and its lag. \(year_t\) and \(year_t^2\) represent the effects of calendar time (and a squared) measured in years.

To show why we include the licence terms, lets first start with a couple of naive models where we only include calendar year effects and terms for the the real price of oil.

Then we run the model with the glm command in R. We specify the “family” to be quassipoisson(), which means we have a overdispersed poisson model. The function assumes an exponential “link” function:

# naive model

naiveMod1 = glm(nDrills ~ year + I(year**2) + oilReal + oilReal_l1 + oilReal_l2, family=quasipoisson(), data=glm_df)
summary(naiveMod1)

## 
## Call:
## glm(formula = nDrills ~ year + I(year^2) + oilReal + oilReal_l1 + 
##     oilReal_l2, family = quasipoisson(), data = glm_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0800  -0.4350  -0.3787  -0.3337   4.6889  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.658e+02  6.598e+02   0.251   0.8016  
## year        -1.170e-01  6.617e-01  -0.177   0.8597  
## I(year^2)    1.643e-05  1.659e-04   0.099   0.9211  
## oilReal     -5.395e-03  2.169e-03  -2.488   0.0129 *
## oilReal_l1   6.335e-03  2.851e-03   2.223   0.0263 *
## oilReal_l2   2.782e-03  2.216e-03   1.256   0.2093  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.265644)
## 
##     Null deviance: 6323.6  on 12551  degrees of freedom
## Residual deviance: 5820.7  on 12546  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Notice the coefficient on the concurrent oil price. It is estimated to be positive. Interpreting this literally, we would say that that as the oil price increases, the rate of drilling decreases. Theoretically, this does not make much sense, and our model is likely misspecified.

For a more serious attempt at modelling, we start by splitting our dataset in two: for pre- and post-2005 periods.

glm_df_pre2005 = glm_df %>% filter(year>1990,year<2005)
glm_df_post2005 = glm_df %>% filter(year>=2005)

We then run models with the licence timing variables, licenceYears and licenceOwnerYears. We also include company fixed effects (that is dummy indicators for each individual company) by adding the term factor(orgnr). Notice though that we do not include year fixed effects. The reason is that this would subsume all the information in the yearly oil price data, which we would like to estimate. Instead, we include calendar year as a quadratic trend term.

mod4_pre = glm(nDrills ~ year + I(year**2) + factor(orgnr) +  licenceYears + licenceOwnerYears + I(licenceOwnerYears**2) + oilReal + oilReal_l1 + oilReal_l2, family=quasipoisson(), data=glm_df_pre2005)
summary(mod4_pre)

## 
## Call:
## glm(formula = nDrills ~ year + I(year^2) + factor(orgnr) + licenceYears + 
##     licenceOwnerYears + I(licenceOwnerYears^2) + oilReal + oilReal_l1 + 
##     oilReal_l2, family = quasipoisson(), data = glm_df_pre2005)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9919  -0.5346  -0.3960  -0.2886   3.9323  
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -1.397e+05  5.407e+04  -2.583  0.00984 ** 
## year                    1.400e+02  5.415e+01   2.586  0.00977 ** 
## I(year^2)              -3.509e-02  1.356e-02  -2.588  0.00970 ** 
## factor(orgnr)914048990 -2.168e-01  4.079e-01  -0.531  0.59519    
## factor(orgnr)914807077 -4.502e-01  4.129e-01  -1.090  0.27566    
## factor(orgnr)918110127 -7.161e-01  6.823e-01  -1.050  0.29400    
## factor(orgnr)919160675 -5.071e-02  4.442e-01  -0.114  0.90913    
## factor(orgnr)921473052 -9.731e-02  4.415e-01  -0.220  0.82555    
## factor(orgnr)923254080 -7.284e-01  8.080e-01  -0.901  0.36741    
## factor(orgnr)923609016 -1.915e-01  3.059e-01  -0.626  0.53143    
## factor(orgnr)923702962  6.160e-01  8.345e-01   0.738  0.46049    
## factor(orgnr)923829822 -1.570e+01  5.587e+02  -0.028  0.97759    
## factor(orgnr)925111406 -2.534e-01  3.399e-01  -0.745  0.45606    
## factor(orgnr)926620207  1.384e-01  6.080e-01   0.228  0.81990    
## factor(orgnr)927066440 -1.080e+00  4.567e-01  -2.364  0.01814 *  
## factor(orgnr)927533081 -3.635e-02  4.989e-01  -0.073  0.94194    
## factor(orgnr)928008398 -3.604e-01  4.746e-01  -0.759  0.44778    
## factor(orgnr)929559428 -1.252e-01  1.114e+00  -0.112  0.91055    
## factor(orgnr)930187240 -2.105e-01  3.155e-01  -0.667  0.50464    
## factor(orgnr)930322784  6.675e-01  8.148e-01   0.819  0.41273    
## factor(orgnr)930459321 -2.649e-01  5.232e-01  -0.506  0.61267    
## factor(orgnr)931713671 -1.446e+01  1.149e+03  -0.013  0.98996    
## factor(orgnr)948138646 -8.620e-01  6.198e-01  -1.391  0.16441    
## factor(orgnr)953935996 -1.448e+01  1.011e+03  -0.014  0.98858    
## factor(orgnr)971505346 -7.723e-01  1.102e+00  -0.701  0.48342    
## factor(orgnr)977026792 -1.439e+01  1.394e+03  -0.010  0.99176    
## factor(orgnr)981355210 -1.205e+00  6.134e-01  -1.965  0.04951 *  
## factor(orgnr)983430783 -1.465e+01  2.572e+03  -0.006  0.99546    
## factor(orgnr)984131941  6.172e-01  1.120e+00   0.551  0.58159    
## factor(orgnr)985706050 -1.450e+01  1.094e+03  -0.013  0.98943    
## factor(orgnr)986209409 -1.455e+01  1.642e+03  -0.009  0.99293    
## factor(orgnr)986615806 -1.445e+01  2.131e+03  -0.007  0.99459    
## licenceYears            2.054e-02  8.852e-03   2.320  0.02039 *  
## licenceOwnerYears      -1.367e-01  3.585e-02  -3.814  0.00014 ***
## I(licenceOwnerYears^2)  2.964e-03  1.340e-03   2.211  0.02711 *  
## oilReal                 3.504e-02  2.391e-02   1.465  0.14301    
## oilReal_l1              6.649e-02  1.636e-02   4.063 4.98e-05 ***
## oilReal_l2              1.817e-03  2.281e-02   0.080  0.93651    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.132379)
## 
##     Null deviance: 1528.0  on 2802  degrees of freedom
## Residual deviance: 1373.9  on 2765  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 15

mod4_post = glm(nDrills ~ year + I(year**2) + factor(orgnr) +  licenceYears + licenceOwnerYears + I(licenceOwnerYears**2) + oilReal + oilReal_l1 + oilReal_l2, family=quasipoisson(), data=glm_df_post2005)

summary(mod4_post)

## 
## Call:
## glm(formula = nDrills ~ year + I(year^2) + factor(orgnr) + licenceYears + 
##     licenceOwnerYears + I(licenceOwnerYears^2) + oilReal + oilReal_l1 + 
##     oilReal_l2, family = quasipoisson(), data = glm_df_post2005)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0252  -0.3966  -0.3149  -0.2442   4.3798  
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             9.792e+04  2.655e+04   3.688 0.000227 ***
## year                   -9.733e+01  2.638e+01  -3.689 0.000226 ***
## I(year^2)               2.419e-02  6.554e-03   3.690 0.000226 ***
## factor(orgnr)891797702 -1.521e+01  1.385e+03  -0.011 0.991240    
## factor(orgnr)913561473  2.114e+00  8.911e-01   2.372 0.017693 *  
## factor(orgnr)913905881 -1.557e+01  1.371e+03  -0.011 0.990943    
## factor(orgnr)914048990  2.487e-01  9.327e-01   0.267 0.789745    
## factor(orgnr)914515300 -1.964e-01  1.351e+00  -0.145 0.884436    
## factor(orgnr)914807077  2.508e-01  8.724e-01   0.288 0.773715    
## factor(orgnr)915419062 -1.535e+01  1.143e+03  -0.013 0.989288    
## factor(orgnr)918110127  4.969e-01  8.573e-01   0.580 0.562222    
## factor(orgnr)919160675  3.857e-01  8.393e-01   0.460 0.645850    
## factor(orgnr)919496649 -1.551e+01  6.296e+03  -0.002 0.998035    
## factor(orgnr)919603771  5.410e-01  1.013e+00   0.534 0.593402    
## factor(orgnr)921473052 -1.475e+01  3.088e+03  -0.005 0.996189    
## factor(orgnr)923609016  1.376e-01  8.329e-01   0.165 0.868762    
## factor(orgnr)923702962  6.086e-01  9.027e-01   0.674 0.500191    
## factor(orgnr)926620207  7.659e-01  8.839e-01   0.867 0.386206    
## factor(orgnr)927066440  2.088e-01  8.342e-01   0.250 0.802379    
## factor(orgnr)928008398 -1.452e+01  4.436e+03  -0.003 0.997388    
## factor(orgnr)929559428 -1.446e+01  3.136e+03  -0.005 0.996321    
## factor(orgnr)930187240  2.158e-01  9.062e-01   0.238 0.811746    
## factor(orgnr)930322784 -1.465e+01  1.423e+03  -0.010 0.991786    
## factor(orgnr)930459321 -1.475e+01  1.588e+03  -0.009 0.992592    
## factor(orgnr)931713671  5.693e-02  9.227e-01   0.062 0.950803    
## factor(orgnr)934651758  1.297e+00  8.413e-01   1.542 0.123126    
## factor(orgnr)940376645  4.397e-01  1.351e+00   0.326 0.744803    
## factor(orgnr)946680591  2.372e+00  1.354e+00   1.752 0.079839 .  
## factor(orgnr)953133210  8.230e-01  1.350e+00   0.609 0.542225    
## factor(orgnr)953935996 -7.501e-01  1.357e+00  -0.553 0.580492    
## factor(orgnr)977026792  1.434e+00  1.121e+00   1.279 0.200759    
## factor(orgnr)981355210 -1.326e+00  1.353e+00  -0.980 0.327104    
## factor(orgnr)983426417  6.719e-01  8.623e-01   0.779 0.435907    
## factor(orgnr)983430783 -1.490e+01  2.095e+03  -0.007 0.994327    
## factor(orgnr)984131941 -1.479e+01  2.374e+03  -0.006 0.995029    
## factor(orgnr)985224323  5.558e-01  8.089e-01   0.687 0.492035    
## factor(orgnr)985706050  7.054e-01  8.447e-01   0.835 0.403703    
## factor(orgnr)985740909  3.658e-01  9.215e-01   0.397 0.691437    
## factor(orgnr)986209409  1.203e+00  7.960e-01   1.511 0.130906    
## factor(orgnr)986478434  9.999e-01  9.216e-01   1.085 0.277942    
## factor(orgnr)986597840 -1.506e+01  4.425e+03  -0.003 0.997285    
## factor(orgnr)986615806  1.058e+00  8.502e-01   1.244 0.213462    
## factor(orgnr)987008644  2.616e-01  1.349e+00   0.194 0.846271    
## factor(orgnr)987581883 -1.475e+01  2.542e+03  -0.006 0.995370    
## factor(orgnr)987863749 -1.501e+01  1.149e+03  -0.013 0.989580    
## factor(orgnr)987904275  1.108e+00  8.605e-01   1.287 0.198064    
## factor(orgnr)987989297  9.131e-01  1.008e+00   0.906 0.364989    
## factor(orgnr)988120545  6.499e-01  1.008e+00   0.645 0.518993    
## factor(orgnr)988400025  5.016e-01  8.999e-01   0.557 0.577252    
## factor(orgnr)988682039 -1.516e+01  2.071e+03  -0.007 0.994160    
## factor(orgnr)988712612 -1.492e+01  1.605e+03  -0.009 0.992582    
## factor(orgnr)989050052  2.633e-01  1.104e+00   0.238 0.811536    
## factor(orgnr)989362100 -1.754e-02  1.006e+00  -0.017 0.986091    
## factor(orgnr)989399071  1.468e-01  9.240e-01   0.159 0.873763    
## factor(orgnr)989490168 -1.502e+01  1.108e+03  -0.014 0.989180    
## factor(orgnr)989563343 -1.534e+01  1.329e+03  -0.012 0.990796    
## factor(orgnr)989795848  6.417e-01  7.984e-01   0.804 0.421579    
## factor(orgnr)990089655  5.104e-01  8.471e-01   0.603 0.546854    
## factor(orgnr)990888213  7.238e-01  7.858e-01   0.921 0.357055    
## factor(orgnr)991193030 -1.509e+01  1.285e+03  -0.012 0.990632    
## factor(orgnr)991317155  1.634e-01  1.350e+00   0.121 0.903669    
## factor(orgnr)991735194 -4.815e-01  1.349e+00  -0.357 0.721086    
## factor(orgnr)991870830  2.442e-01  9.219e-01   0.265 0.791119    
## factor(orgnr)993787787 -1.794e-02  9.241e-01  -0.019 0.984512    
## factor(orgnr)994434403  1.064e+00  1.107e+00   0.961 0.336521    
## factor(orgnr)997015231 -1.524e+01  3.614e+03  -0.004 0.996636    
## factor(orgnr)998709326 -1.546e+01  3.609e+03  -0.004 0.996582    
## factor(orgnr)998852722 -9.364e-01  1.350e+00  -0.694 0.487946    
## licenceYears            1.207e-02  5.120e-03   2.358 0.018413 *  
## licenceOwnerYears      -3.856e-01  7.161e-02  -5.385 7.45e-08 ***
## I(licenceOwnerYears^2)  2.119e-02  9.390e-03   2.257 0.024061 *  
## oilReal                 1.964e-03  3.160e-03   0.621 0.534383    
## oilReal_l1              5.730e-03  3.371e-03   1.700 0.089217 .  
## oilReal_l2              1.436e-02  3.888e-03   3.694 0.000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.212689)
## 
##     Null deviance: 3051.2  on 8307  degrees of freedom
## Residual deviance: 2837.2  on 8234  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 16

Scanning through the results, notice that the licence timing variables are large and meaningful. The coefficient on the licence-owner length (the length of time that a given owner has had a licence) is negative and large in magnitude. This makes sense. An owner will tend to drill more often at the beginning of their licence. The licence time (time since initial granting of licence) is estimated to be positive. Be careful about how you interpret this variable. Usually we interpret a coefficient in a multipple regression as a unit change holding all other variables at a constant level. But here a change in the licence time will necessarily change at the same time and unit as licence owner time. So really we need to interpret these two variables together (more on this below).

Let’s work now on visualising the prediction from this model. This will involve choosing the variable of interest and seeing how the predicted values change with this variable, and then choosing what values to hold the other variables at.

We’ll save our data for our prediction in the newData_pre and newData_post dataframes. Below notice that we vary both the liceneOwnerYears and licenceYears variables while holding the other variables in our model at a constant level. We set the oil price at 70 (dollars per barrel).

newData_pre = tibble(
  licenceOwnerYears = seq(10)-1, 
  year = rep(2010, 10),
  licenceYears = seq(10)-1, 
  oilReal = mean(70, na.rm=TRUE),
  oilReal_l1 = mean(70, na.rm=TRUE),
  oilReal_l2 = mean(70, na.rm=TRUE),
  orgnr = factor(919160675) #Eni - we choose a somewhat random representative oil firm for the factor. This is in any case just a shifting faction. 
  )

newData_post = tibble(
  licenceOwnerYears = seq(10)-1, 
  year = rep(2010, 10),
  licenceYears = seq(10)-1, 
  oilReal = mean(70, na.rm=TRUE),
  oilReal_l1 = mean(70, na.rm=TRUE),
  oilReal_l2 = mean(70, na.rm=TRUE),
  orgnr = factor(919160675) #Eni
)

Now we’ll create prediction dataframes using the predict() function. Notice the extra parameters: se.fit=TRUE tells our model to include standard errors for our predictions. type=“response” gives us our predictions on the scale of the response (rate of drilling), rather than the scale of the linear predictor (the response is just the exponent of the linear predictor.)

pred_pre2005 = as_tibble(predict.glm(mod4_pre, newdata = newData_pre, se.fit=TRUE, type="response"))
pred_pre2005["period"] = "Pre-2005"
pred_pre2005["Ownership_length"] = seq(10)-1
pred_post2005 = as_tibble(predict.glm(mod4_post, newdata = newData_post, se.fit=TRUE, type="response"))
pred_post2005["period"] = "Post-2005"
pred_post2005["Ownership_length"] = seq(10)-1

Then I’ll create approximate 70% confidence intervalls (+/- 1 standard error)

pred_comb = rbind(pred_pre2005, pred_post2005)
pred_comb["min70"] = pred_comb["fit"] - pred_comb["se.fit"]
pred_comb["max70"] = pred_comb["fit"] + pred_comb["se.fit"]

Now we can plot our predictions

pred_comb %>% 
  ggplot(aes(x=Ownership_length, y=fit, linetype=period)) +
  geom_line() +
  geom_ribbon(aes(x=Ownership_length, ymin=min70, ymax=max70), alpha=.3) +
  ylab("Expected drilling") +
  xlab("Licence ownership length") +
  theme_classic()

The licenceOwnerYears variable is a strong predictor of drilling and also non-linear, so it might be worth modelling this as a non-parametric function as in lab 8. The model then becomes a GAM. We can run the models below (they can take some time to run)

#pre and post-2005

gam_mod4_pre = gam(nDrills ~ s(licenceOwnerYears) + year + I(year**2) + factor(orgnr) +  licenceYears + oilReal + oilReal_l1 + oilReal_l2, family=quasipoisson(), data=glm_df_pre2005)

## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : Fitting terminated with step failure - check results carefully

## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : gam.fit3 algorithm did not converge

summary(gam_mod4_pre)

## 
## Family: quasipoisson 
## Link function: log 
## 
## Formula:
## nDrills ~ s(licenceOwnerYears) + year + I(year^2) + factor(orgnr) + 
##     licenceYears + oilReal + oilReal_l1 + oilReal_l2
## 
## Parametric coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -1.350e+05  5.548e+04  -2.433 0.015040 *  
## year                    1.353e+02  5.557e+01   2.435 0.014943 *  
## I(year^2)              -3.392e-02  1.392e-02  -2.438 0.014847 *  
## factor(orgnr)914048990 -1.080e-01  4.183e-01  -0.258 0.796313    
## factor(orgnr)914807077 -3.548e-01  4.317e-01  -0.822 0.411182    
## factor(orgnr)918110127 -6.608e-01  7.032e-01  -0.940 0.347446    
## factor(orgnr)919160675  1.586e-02  4.609e-01   0.034 0.972556    
## factor(orgnr)921473052 -9.139e-03  4.583e-01  -0.020 0.984091    
## factor(orgnr)923254080 -6.510e-01  8.314e-01  -0.783 0.433679    
## factor(orgnr)923609016 -1.109e-01  3.212e-01  -0.345 0.729846    
## factor(orgnr)923702962  5.889e-01  8.599e-01   0.685 0.493530    
## factor(orgnr)923829822 -2.776e+01  2.498e+05   0.000 0.999911    
## factor(orgnr)925111406 -1.593e-01  3.554e-01  -0.448 0.654088    
## factor(orgnr)926620207  2.287e-01  6.277e-01   0.364 0.715556    
## factor(orgnr)927066440 -1.061e+00  4.697e-01  -2.259 0.023965 *  
## factor(orgnr)927533081  1.003e-01  5.162e-01   0.194 0.845899    
## factor(orgnr)928008398 -2.786e-01  4.920e-01  -0.566 0.571302    
## factor(orgnr)929559428 -1.362e-01  1.146e+00  -0.119 0.905365    
## factor(orgnr)930187240 -1.513e-01  3.300e-01  -0.459 0.646518    
## factor(orgnr)930322784  6.764e-01  8.388e-01   0.806 0.420077    
## factor(orgnr)930459321 -2.007e-01  5.411e-01  -0.371 0.710780    
## factor(orgnr)931713671 -1.610e+01  2.670e+03  -0.006 0.995188    
## factor(orgnr)948138646 -7.639e-01  6.395e-01  -1.195 0.232382    
## factor(orgnr)953935996 -1.989e+01  1.545e+04  -0.001 0.998973    
## factor(orgnr)971505346 -6.443e-01  1.132e+00  -0.569 0.569314    
## factor(orgnr)977026792 -1.606e+01  3.268e+03  -0.005 0.996078    
## factor(orgnr)981355210 -1.145e+00  6.330e-01  -1.810 0.070479 .  
## factor(orgnr)983430783 -1.866e+01  1.877e+04  -0.001 0.999207    
## factor(orgnr)984131941  5.996e-01  1.151e+00   0.521 0.602369    
## factor(orgnr)985706050 -2.635e+01  4.132e+05   0.000 0.999949    
## factor(orgnr)986209409 -1.606e+01  3.436e+03  -0.005 0.996270    
## factor(orgnr)986615806 -1.616e+01  4.927e+03  -0.003 0.997384    
## licenceYears            2.162e-02  9.101e-03   2.376 0.017579 *  
## oilReal                 3.270e-02  2.453e-02   1.333 0.182668    
## oilReal_l1              6.453e-02  1.682e-02   3.836 0.000128 ***
## oilReal_l2              1.607e-03  2.340e-02   0.069 0.945240    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df     F p-value   
## s(licenceOwnerYears) 8.767  8.964 2.742 0.00355 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0329   Deviance explained = 11.1%
## GCV = 0.49988  Scale est. = 1.1904    n = 2803

plot(gam_mod4_pre)

gam_mod4_post = gam(nDrills ~ s(licenceOwnerYears) + year + I(year**2) + factor(orgnr) +  licenceYears + oilReal + oilReal_l1 + oilReal_l2, family=quasipoisson(), data=glm_df_post2005)

## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : Fitting terminated with step failure - check results carefully

## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L =
## G$L, : gam.fit3 algorithm did not converge

summary(gam_mod4_post)

## 
## Family: quasipoisson 
## Link function: log 
## 
## Formula:
## nDrills ~ s(licenceOwnerYears) + year + I(year^2) + factor(orgnr) + 
##     licenceYears + oilReal + oilReal_l1 + oilReal_l2
## 
## Parametric coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.025e+05  2.721e+04   3.767 0.000166 ***
## year                   -1.019e+02  2.704e+01  -3.768 0.000165 ***
## I(year^2)               2.533e-02  6.719e-03   3.769 0.000165 ***
## factor(orgnr)891797702 -1.423e+01  8.637e+02  -0.016 0.986859    
## factor(orgnr)913561473  2.096e+00  9.014e-01   2.326 0.020069 *  
## factor(orgnr)913905881 -2.094e+01  1.870e+04  -0.001 0.999107    
## factor(orgnr)914048990  3.158e-01  9.416e-01   0.335 0.737329    
## factor(orgnr)914515300 -2.475e-01  1.367e+00  -0.181 0.856305    
## factor(orgnr)914807077  2.346e-01  8.825e-01   0.266 0.790385    
## factor(orgnr)915419062 -2.050e+01  1.478e+04  -0.001 0.998893    
## factor(orgnr)918110127  4.938e-01  8.670e-01   0.570 0.569025    
## factor(orgnr)919160675  3.717e-01  8.490e-01   0.438 0.661556    
## factor(orgnr)919496649 -1.527e+01  5.226e+03  -0.003 0.997668    
## factor(orgnr)919603771  4.966e-01  1.025e+00   0.484 0.628069    
## factor(orgnr)921473052 -1.787e+01  1.393e+04  -0.001 0.998977    
## factor(orgnr)923609016  1.811e-01  8.430e-01   0.215 0.829913    
## factor(orgnr)923702962  6.317e-01  9.130e-01   0.692 0.489041    
## factor(orgnr)926620207  7.580e-01  8.941e-01   0.848 0.396574    
## factor(orgnr)927066440  2.124e-01  8.435e-01   0.252 0.801236    
## factor(orgnr)928008398 -2.182e+01  1.693e+05   0.000 0.999897    
## factor(orgnr)929559428 -2.324e+01  2.521e+05   0.000 0.999926    
## factor(orgnr)930187240  2.094e-01  9.167e-01   0.228 0.819336    
## factor(orgnr)930322784 -2.118e+01  3.753e+04  -0.001 0.999550    
## factor(orgnr)930459321 -2.342e+01  1.211e+05   0.000 0.999846    
## factor(orgnr)931713671  5.997e-02  9.333e-01   0.064 0.948764    
## factor(orgnr)934651758  1.303e+00  8.510e-01   1.531 0.125804    
## factor(orgnr)940376645  4.434e-01  1.366e+00   0.325 0.745532    
## factor(orgnr)946680591  2.362e+00  1.370e+00   1.724 0.084712 .  
## factor(orgnr)953133210  8.330e-01  1.366e+00   0.610 0.541995    
## factor(orgnr)953935996 -7.053e-01  1.373e+00  -0.514 0.607560    
## factor(orgnr)977026792  1.461e+00  1.134e+00   1.289 0.197533    
## factor(orgnr)981355210 -1.364e+00  1.369e+00  -0.996 0.319122    
## factor(orgnr)983426417  6.714e-01  8.723e-01   0.770 0.441525    
## factor(orgnr)983430783 -2.300e+01  1.301e+05   0.000 0.999859    
## factor(orgnr)984131941 -2.300e+01  1.475e+05   0.000 0.999876    
## factor(orgnr)985224323  5.625e-01  8.183e-01   0.687 0.491851    
## factor(orgnr)985706050  7.131e-01  8.545e-01   0.835 0.403980    
## factor(orgnr)985740909  3.740e-01  9.321e-01   0.401 0.688276    
## factor(orgnr)986209409  1.189e+00  8.052e-01   1.476 0.139937    
## factor(orgnr)986478434  9.955e-01  9.322e-01   1.068 0.285566    
## factor(orgnr)986597840 -1.349e+01  2.015e+03  -0.007 0.994662    
## factor(orgnr)986615806  1.069e+00  8.601e-01   1.243 0.214043    
## factor(orgnr)987008644  2.456e-01  1.365e+00   0.180 0.857177    
## factor(orgnr)987581883 -1.339e+01  1.338e+03  -0.010 0.992017    
## factor(orgnr)987863749 -1.421e+01  7.869e+02  -0.018 0.985593    
## factor(orgnr)987904275  1.096e+00  8.706e-01   1.259 0.208206    
## factor(orgnr)987989297  8.893e-01  1.020e+00   0.872 0.383127    
## factor(orgnr)988120545  6.397e-01  1.019e+00   0.628 0.530280    
## factor(orgnr)988400025  4.848e-01  9.104e-01   0.532 0.594404    
## factor(orgnr)988682039 -1.417e+01  1.257e+03  -0.011 0.991006    
## factor(orgnr)988712612 -2.247e+01  7.101e+04   0.000 0.999748    
## factor(orgnr)989050052  2.855e-01  1.117e+00   0.256 0.798233    
## factor(orgnr)989362100 -2.059e-03  1.018e+00  -0.002 0.998385    
## factor(orgnr)989399071  1.157e-01  9.348e-01   0.124 0.901537    
## factor(orgnr)989490168 -2.423e+01  1.108e+05   0.000 0.999825    
## factor(orgnr)989563343 -2.423e+01  1.101e+05   0.000 0.999824    
## factor(orgnr)989795848  6.191e-01  8.076e-01   0.767 0.443300    
## factor(orgnr)990089655  5.159e-01  8.568e-01   0.602 0.547136    
## factor(orgnr)990888213  6.957e-01  7.950e-01   0.875 0.381535    
## factor(orgnr)991193030 -1.423e+01  8.347e+02  -0.017 0.986403    
## factor(orgnr)991317155  1.439e-01  1.366e+00   0.105 0.916062    
## factor(orgnr)991735194 -4.638e-01  1.364e+00  -0.340 0.733898    
## factor(orgnr)991870830  2.264e-01  9.326e-01   0.243 0.808160    
## factor(orgnr)993787787 -6.745e-02  9.349e-01  -0.072 0.942489    
## factor(orgnr)994434403  1.028e+00  1.119e+00   0.918 0.358417    
## factor(orgnr)997015231 -1.786e+01  1.300e+04  -0.001 0.998904    
## factor(orgnr)998709326 -2.484e+01  3.859e+05   0.000 0.999949    
## factor(orgnr)998852722 -9.470e-01  1.366e+00  -0.693 0.488088    
## licenceYears            1.312e-02  5.181e-03   2.533 0.011340 *  
## oilReal                 2.439e-03  3.277e-03   0.744 0.456653    
## oilReal_l1              4.929e-03  3.485e-03   1.414 0.157337    
## oilReal_l2              1.514e-02  4.022e-03   3.765 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df     F p-value    
## s(licenceOwnerYears) 6.181  6.378 8.313  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0186   Deviance explained = 7.45%
## GCV = 0.34641  Scale est. = 1.2407    n = 8308

plot(gam_mod4_post)

Using the plot function will automatically plot the non-linear functions, though the scales used make these hard to interpret.

It may be more useful for us to again plot our predictions based on the model.

pred_pre2005 = as_tibble(predict.gam(gam_mod4_pre, newdata = newData_pre, se.fit=TRUE, type="response"))
pred_pre2005["period"] = "Pre-2005"
pred_pre2005["Ownership_length"] = seq(10)-1
pred_post2005 = as_tibble(predict.gam(gam_mod4_post, newdata = newData_post, se.fit=TRUE, type="response"))
pred_post2005["period"] = "Post-2005"
pred_post2005["Ownership_length"] = seq(10)-1

Then I’ll create approximate 70% confidence intervalls (+/- 1 standard error)

pred_comb = rbind(pred_pre2005, pred_post2005)
pred_comb["min70"] = pred_comb["fit"] - pred_comb["se.fit"]
pred_comb["max70"] = pred_comb["fit"] + pred_comb["se.fit"]

Now we can plot our predictions for our gam model:

pred_comb %>% 
  ggplot(aes(x=Ownership_length, y=fit, linetype=period)) +
  geom_line() +
  geom_ribbon(aes(x=Ownership_length, ymin=min70, ymax=max70), alpha=.3) +
  ylab("Expected drilling") +
  xlab("Licence ownership length") +
  theme_classic()

Decision analysis

Now we’ll shift tracks and consider a decision analysis problem. The idea of a decision analysis problem is that often times you are using a regression model to answer some question or make a decision which may be related to the regression results in a non-linear way.

An example many have seen are election forecasting models. Here you often have an underlying regression model (or set of models) that might give predictions on how much vote share a candidate may get. But this is not necessarily the outcome that is important, instead we are interested in whether that vote share is enough to win the election. We might then also be interested in an aggregated measure of winning - like the likelihood of a party getting a majority in parliament (or in the US case congress).

Since we want a probabilistic answer to our question of who will win the regression, then we usually need to make use of simulation from our model results. This involves treating our regression results as uncertain, that is considering the coefficients and predictions from our regression model not as point estimates, but as distributions that we sample from - what we might call posterior distributions. This is a bit of a subtle point, but when we treat the model results as uncertain, then we are in the realm of Bayesian statistics. If you want a gentle introduction to some Bayesian ideas you could start with this lab from an applied statistics course I give at NTNU, or take a closer look at the ROS.

We’ll demonstrate how we can use decision analysis with real data on wind power and electricity markets, but where the modelling is very simplified–you should consider this a toy model to demonstrate the principles rather than something to emulate in a real analysis.

We will be using the Bayesian regression and modelling package rstanarm. You will probably have to install this ahead of time with the command install.packages(“rstanarm”).

library(tidyverse)
library(rstanarm)

## Loading required package: Rcpp

## This is rstanarm version 2.21.3

## - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!

## - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.

## - For execution on a local, multicore CPU with excess RAM we recommend calling

##   options(mc.cores = parallel::detectCores())

Here we will show an example using electricity data and a made-up decision on whether an electricity monopolist should invest in more wind power.

We start by importing our electricity data

elDF = read_csv("http://jmaurit.github.io/analytics/labs/data/wt_data2.csv")

## Rows: 26304 Columns: 90
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (3): hour, hour_ind, month
## dbl  (85): wind_SE1, wind_SE2, wind_SE3, wind_SE4, wind_DK1, wind_DK2, SE_nx...
## dttm  (2): date, time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

lets run the simplest of regressions using the stan_glm() command. This will run a Bayesian regression using a simulation method called Markov-Chain Monte Carlo (MCMC). The details are technical and not that important. The important thing to know is that under most simple cases the Bayesian regression will give similar point estimates to maximum likelihood estimation we have been using (something we will show.) But the Bayesian regression provides results explicitly as a probability distribution which allows us simulate from the results (the so called “posterior”). If we wish, we could also include information not in the data through a prior distribution.

windMod1 = stan_glm(DK1EurMW~wind_DK1, data=elDF)

## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000129 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.29 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.04591 seconds (Warm-up)
## Chain 1:                1.64913 seconds (Sampling)
## Chain 1:                1.69504 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 9e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.04679 seconds (Warm-up)
## Chain 2:                1.6403 seconds (Sampling)
## Chain 2:                1.68709 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 7e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.0392 seconds (Warm-up)
## Chain 3:                1.64046 seconds (Sampling)
## Chain 3:                1.67966 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 4e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.037394 seconds (Warm-up)
## Chain 4:                1.64644 seconds (Sampling)
## Chain 4:                1.68383 seconds (Total)
## Chain 4:

print(windMod1)

## stan_glm
##  family:       gaussian [identity]
##  formula:      DK1EurMW ~ wind_DK1
##  observations: 26301
##  predictors:   2
## ------
##             Median MAD_SD
## (Intercept) 3958.0   13.8
## wind_DK1      -0.5    0.0
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 1346.8    5.9
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

The chains and iteration information you get has to do with the MCMC simulation behind the scenes. The results give us a negative point estimate, telling us that at the median of the distribution, a one megawatt-hour increase in wind power, will reduce prices by .5 Euro per MWH.

We can compare this to the results from the build-in glm function that uses maximum likelihood

windMod1_ml = glm(DK1EurMW~wind_DK1, data=elDF)
summary(windMod1_ml)

## 
## Call:
## glm(formula = DK1EurMW ~ wind_DK1, data = elDF)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8648.4   -875.1   -176.4    694.0  10528.2  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.959e+03  1.380e+01   286.8   <2e-16 ***
## wind_DK1    -5.077e-01  9.349e-03   -54.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1813589)
## 
##     Null deviance: 5.3044e+10  on 26300  degrees of freedom
## Residual deviance: 4.7696e+10  on 26299  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 453662
## 
## Number of Fisher Scoring iterations: 2

We can see that results for the point estimates are very similar.

Defining the business problem

Let us say that a Danish monopolist electricity utility is deciding whether to invest in a new large off-shore wind park.

The size of the wind park is 400 megawatts
The average capacity factor is 50%. Thus in a typical hour, the park would be producing 400*.5 = 200 MWH of electricity.
The investment cost of the park is 300,000,000 (300 million euro)
We assume no running costs (marginal costs) for the wind power plant
The utility currently operates wind power of 500 megawatts (with the same capacity factor as above, and with no operating costs).
The utility also owns 1000 MW of thermal capacity (gas, coal, nuclear), which we assume has a 100% capacity factor
We consider a period of 1000 hours, without any NPV discount

We want to run an analysis to determine whether the monopoly should make the investment (we assume they are profit maximizing).

(all these numbers are completely unrealistic)

Posterior simulations

We can start by considering the model results from our Bayesian regression as they actually are: a set of simulations that represent a distribution:

windMod1DF = as_tibble(windMod1)
windMod1DF

## # A tibble: 4,000 × 3
##    `(Intercept)` wind_DK1 sigma
##            <dbl>    <dbl> <dbl>
##  1         3959.   -0.503 1352.
##  2         3964.   -0.512 1340.
##  3         3955.   -0.502 1342.
##  4         3962.   -0.509 1346.
##  5         3936.   -0.500 1348.
##  6         3952.   -0.509 1338.
##  7         3971.   -0.514 1356.
##  8         3970.   -0.513 1355.
##  9         3963.   -0.511 1342.
## 10         3943.   -0.506 1350.
## # … with 3,990 more rows

hist(windMod1DF$wind_DK1, breaks=100)

So here we see the distribution of possible values of the coefficient on the wind term. It is centered around -.50, but with some uncertainty.

Below we can establish some parameters for our business modelling. Here we calculate a mean value for the amount of wind from the current wind power, and then we get a parameter for the mean amount of wind if the investment is made:

meanWind = mean(elDF$wind_DK1, na.rm=TRUE)
print(meanWind) #mean amount of wind

## [1] 1179.467

newWind = .5*400
expectedWind = meanWind + newWind
print(expectedWind)

## [1] 1379.467

Now we wish to generate prediction distributions from our Bayesian model. But there are actually two types of prediction distribution we could use. We could get the distribution of the mean value (what we can call the inferential distribution.) This is the distribution of what we expect the mean value to be. To get this we use the function posterior_linpred()

#we'll use the current mean wind number for our prediction
newData = tibble(wind_DK1=meanWind)
#this is the amount of wind given an investment
newData2 = tibble(wind_DK1=expectedWind)
windDist1 = posterior_linpred(windMod1, newdata=newData)
hist(windDist1, breaks=100)

So above we get the distribution of the mean price given current wind. But this is not exactly what we are interested in. We are interested in the distribution of actual predicted prices: what we call the prediction distribution. The difference is the extra uncertainty in the error term, \(\epsilon\), summarized by the standard deviation term \(\sigma\). Here we use the posterior_predict() function.

simPrices = posterior_predict(windMod1, newdata=newData)
simPricesHighWind = posterior_predict(windMod1, newdata=newData2)
simsDF = tibble(simPrices[,1], simPricesHighWind[,1])
colnames(simsDF) = c("simPrices", "simPricesHighWind")
ggplot(simsDF) +
  geom_histogram(aes(simPrices), bins=100, fill="red", alpha=.5) +
  geom_histogram(aes(simPricesHighWind), bins=100, fill="blue", alpha=.5)

The profit function

Now we define our profit function, which will take the simulated prices and output an expected profit for the two scenarios.

First we define some parameters

InvestmentCost = 300000000 #investment cost

ExistingWindCap = 500 #How much existing wind capacity the company owns
NewWind = 400 #Size in capacity of new investment
cfWind = .5 #capacity factor of wind power
#no running/marginal cost for wind

thermalCap = 1000 #How much thermal capacity the company owns
mcThermal = 2000 #The marginal cost of the thermal capacity owned by the company
n = 1000 #1000 periods, relevant economic period to evaluate profitability. 

#only run thermal if price>mc

Now we are ready to create a profit function.

The function takes as input a row of the simulation data frame.
We calculate a variable for operating profit for both scenarios (operProfit, operProfitHighWind). The operating profit is market price times the expected amount of wind power produced. In addition, if the price is above the marginal cost of the thermal generation, then the company also gets operating profit from the difference between the market price and the marginal cost.
The total profit for the no-invesment scenario is simply a multiple of the operating profit times the number of periods that we consider (n=1000).
For the investment scenario, we must subtract the Investment cost variable.
The function returns a row with two new columns representing the expected profit for both scenarios

We use the split-apply-combine tools in r, this time making use of the rowwise() function, which applies the functions in mutate() to each row of the data frame.

newSimsDF = simsDF %>% rowwise() %>% mutate(
  operProfit = simPrices*ExistingWindCap*cfWind,
  operProfit = if_else(simPrices > mcThermal, 
                       operProfit + (simPrices-mcThermal)*thermalCap, 
                       operProfit),
  profit = operProfit*n,
  operProfitHighWind = simPricesHighWind*(ExistingWindCap+newWind)*cfWind,
  operProfitHighWind = if_else(simPricesHighWind > mcThermal, 
  operProfitHighWind + (simPricesHighWind-mcThermal)*thermalCap, 
  operProfitHighWind),
  profitHighWind = operProfitHighWind*n - InvestmentCost
)

Now we’ll take a look at our profit distribution with and without our investment:

ggplot(newSimsDF) +
  geom_histogram(aes(profit), fill="red", bins=100) +
  geom_histogram(aes(profitHighWind), fill="blue", bins=100)

# Decision rule

Now that we have a distribution of possible profits for the two scenarios, we need to devise a criteria for making our decision.

A straight-forward criteria would be to compare the expectation (aka mean) profit, and see which is higher.

expectedProfit = mean(newSimsDF$profit)
expectedProfitHighWind = mean(newSimsDF$profitHighWind)
print(c("expected profit, no investment", expectedProfit))

## [1] "expected profit, no investment" "2263268885.60045"

print(c("expected profit, investment", expectedProfitHighWind))

## [1] "expected profit, investment" "2195619734.76251"

print(expectedProfitHighWind>expectedProfit)

## [1] FALSE

Under this rule we would not make the investment. But this is by no means the only viable decision rule. We could, for example, also look at which decision will lead to smallest probability of loss.

nsim = dim(newSimsDF)[1]
nLoss  = sum(newSimsDF$profit<0)/nsim
nLossHighWind = sum(newSimsDF$profitHighWind<0)/nsim
print(c("Probability of loss, no investment", nLoss*100))

## [1] "Probability of loss, no investment" "0.8"

print(c("Probability of loss, investment", nLossHighWind*100))

## [1] "Probability of loss, investment" "3.875"

The probability of a loss is higher if we invest.

Is the wind park in itself profitable?

Now imagine a scenario where an independent power producer wanted to come into the market and invest in the wind park. Then we could analyse the profitability of the wind park itself, without taking into account the effect on the other generators.

We do this below

newSimsDF["windParkNetProfits"] = newSimsDF$simPricesHighWind*NewWind*cfWind*n-InvestmentCost

ggplot(newSimsDF) +
  geom_histogram(aes(windParkNetProfits)) +
  geom_vline(xintercept=0, color="red")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Problems

Consider the Poisson model of well drilling (either with linear or non-parametric representation of the licence-timing.) Take the model results and make predictions that vary by the price of oil: between 30 and 100 dollars a year. Show the predictions in a chart.

A non-linear probability model of drilling and decision analysis

Johannes Mauritzen

Spring 2023