Note: If you are experienced with r, feel free to skim through this prelab quickly.
Increasingly, being able to access, organise, transform, and model data is an important skill in a host of different industries. With the energy industry in a period of rapid adjustment towards low-carbon sources and increasing regulation, the ability to use data to analyse the market and make decisions is especially important.
You may have used other data and statistical software in previous courses: like Excel, SAS, STATA, etc. Why do we use R in this course?
Because R is more than just a statistics package, it is a programming langauge and an open source platform. It is used by almost all academic statisticians and most empirical researchers.
Importantly, R is also used by analysts and researchers at:
Amazon
Statoil
Norges Bank
New York Times
And thousands of other businesses and organisations
R has become one of the main tools of an entirely new professional category, often called a Data Analyst or Data Scientist. In the labor market there is a large demand for candidates with solid data skills.
The lectures, labs and assignments of this course will only scratch the surface of what you can do with R. In completing the labs and assignments you will need to find information on commands, writing scripts and trouble-shooting bugs. Luckily there are many excellent sources on the internet.
You can start with An Introduction to R, from the R-project.
UCLA has a usefull website on using R.
But perhaps the most common (and often best) way of getting help is to simply use google or other search engine. You can usually find a good solution to a problem if you type in something like “How to plot histogram in r”.
For the purposes of completing all the varied tasks we will cover in this course, a possible alternative would be using Python with companion packages Numpy, Pandas, scipy.stats and statsmodels. I have had students who have completed the course using Python, but I will have a more limited ability to provide support and feedback.
R is a powerful tool. It is also free and open source and can be downloaded with no restrictions on use from the web.
We will also be downloading a helper-program called R-studio that makes using R easier.
Instructions are below:
After you have opened RStudio, you should see 4 different windows on the screen.
At the bottom left there is a window called “console”
Try to write in the following after the “>” symbol:
print("Hello World!")
## [1] "Hello World!"
What’s happened? The console is where we communicate with R. We can think of R here as the motor - it is what is doing the calculating in the background. Here we just tell it to print “hello world” and it does it.
Now try writing in the console
#This is a comment
Nothing should happen. Using “#” in a code means that that line becomes inactive. It is just used to make a comment on that code.
For the most part, we will not be directly writing code into the console. This turns out to be pretty inefficient.
What we want to do is write our code in a “script file”, basically just a text file, and then run the code after in the console. That way we save the code we have written.
We can open a new script file, which are denoted as .r files in r/rstudio by going to file > new file > R script. A new empty script page should open in the upper left window. You can save this file in a convenient place (like an energy analytics folder).
Now you can write the following few lines into your script file.
print("hello")
print("R is fun")
print("holy cow")
If you highlight a line and press the “run” button on the top banner the one line will be sent to the console. If you highlight all three lines, all three lines will be sent to the console and executed in order.
Here you will probably end up using a hot key - on a mac you press command + return.
What makes R really powerful is that it also serves as a plattform, and that developers can write their own “apps” for R, called “packages”. The greatly expands the capability of r.
One such package we will make use of is called ggplot and is used for visualising data. To use this package we need to do two things.
First, we need to download the package from the sentral r depository (the R appstore, if you will).
install.packages("ggplot2")
This we only need to do once (unless you update your computer or a new version of r or the package become available)
Every time we start a new session and we want to use the package, we need to load it (equivalent to starting up an app). This we do by:
library(ggplot2)
In lab 1 we will see how r works in a little more detail, including how to import data.