You may have have heard of or used other programming languages or specialty programs for mathematical analysis and computation--Matlab, Octave, R and Julia to name a few. So why do we use Python in this course?
A few reasons:
Finally, Python (like r and Julia) is open source. This has some obvious benefits: you do not need to pay licence fees to get full versions, nor fiddle around with under-powered student versions.
A more subtle advantage is that the open-source nature of the programs has helped establish some strong network effects. There are now thousands of people working on packages for Python and millions making use of those packages. This also leads to a quite robust network of courses, documentation and question-and-answer forums.
A Qualified Yes.
But then you accept that you may not be able to get the same degree of help, support and feedback from the instructor.
Base Python is an excellent general programming language. But we need to make use of set of extra add-on packages to really get the most of Python for mathematical analysis. The main packages we will be using are:
Luckily, you won't need to download and install these packages independently. They are all included in a Python-based platform for data science called Anaconda.
You can download and install anaconda by going to this address:
https://www.anaconda.com/products/individual
You can download versions for Windows, Mac or Linux.
The download is free and with no restrictions on use.
If you have technical problems, you can consult the documentation:
In this course, we will be working within Jupyter notebooks to work through labs, completing assignments and creating the project assignment.
Jupyter notebooks run on a webbrowser and allows you to run commands for python
But it also allows you to write and format text and mathematical notation and then save the combination of text, code, and mathematical notation as a pdf or html file.
Jupyter is not always the best choice for doing analysis. For complex projects I often prefer to use a development environment, like Spyder, which also comes along with the Anaconda platform. But for our purposes, Jupyter functions very well.
Before opening up a jupyter notebook, you should create a folder on your computer where you will save your notebooks and other materials for the work, preferably on the local drive under your username. So for example on my macintosh I have created a folder under:
Macintosh HD/Users/johannesmauritzen/anv_statistikk
Exactly where doesn't matter, as long as you can easily find it. I would however avoid various cloud drives like icloud, onedrive, etc, since these can lead to problems.
Once you have installed anaconda and created a folder, you can open up jupyter notebooks in two ways.
If you are familiar with using the terminal/command line on your system, you can open up the terminal app, navigate to the folder you have created and type in the following command:
jupyter notebook
Alternatively, you can find the Anaconda-Navigator navigator app installed on your machine. Then click on the launch button under the Jupyter Notebook icon. You can then navigate to the folder you created from within jupyter.
To start a new Jupyter notebook, click on the new button, then Python 3 in the drop-down menu
Some people experience problems finding their local disc and base user folder, which are important for finding your jupyter notebook files and loading data from your local disc. Here is video that walks through the process on a mac. The process for windows pc is similar.
Hopefully, installing Anaconda and getting a jupyter notebook up and running has not been a problem. But if you are running into problems which you can not easily fix, you might consider using Google's colab:
https://colab.research.google.com
With a google account, you can easily run python in the cloud. Google provides a user-friendly tutorial.
You should now have a pretty blank looking screen in front of you, with a single box with In written on the left, something like this:
This is our computation/interpreter field. "Behind" this field is iPython, basically a version of the Python interpreter with some special tools especially designed for interactive work-flows.
If we wanted, we could use it as a simple calculator. We could ask, for example ask what 5+5 is:
5+5
10
Here we run the computation by pressing the run button, or pressing shift + return
We can also run other python commands:
print("Hello World!")
Hello World!
Or create an object, update that value and print it:
myFavoriteNumber=21
myFavoriteNumber+=5
#same as myFavoriteNumber = myFavoriteNumber +5
print(myFavoriteNumber)
26
(Notice the use of # to comment)
Basically, most things you can do with Python, you can do in the iPython interpreter fields
You can also easily create formatted text and equations instead of code. While in a field, you can do one of two things:
But this will get annoying after doing it a few times.
Instead, learn the hotkey combo: cntrl m followed by a quick extra m
Notice now that the In [ ] label on the label is gone.
You are now ready to write text, and when you run the field, text will be printed, nicely formatted (next subsection)
You can format your text using markdown notation. For example
wrapping text with a "" makes it italics*
"**" Two stars makes bold text
You can see more with the markdown cheatsheet: https://jupyterbook.org/reference/cheatsheet.html
or here:
In the text fields you can also write latex-style equations by wrapping the equation in \$
so
$\sum_0^N x_i$
becomes
$\sum_0^N x_i$
You will experience - quite often - not knowing how to do something or running into some error. There are a few quick ways of getting help.
One is accessing the build-in documentation with Python. We might wonder what the built-in command max does, for example. Then we can simply write:
?max
This should lead to some documentation to pop-up below in your browser:
max(5,4,2,7)
7
But sometimes you don't even know which command or set of commands you should use to do a certain task. This is something that happens constantly to even the most experienced programmer/analyst.
The solution is often to get on the net and start googling.
If you are using certain packages (like Pandas), you might also check their online documentation.
Now we should be ready to get started with python. If you installed Anaconda, then the basic packages we need should be ready to go. All we need to do is to load them into memory. Here we will load in some data and plot it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Now we have loaded in pandas, numby and and sub-package pyplot in matplotlib - you can think of matplotlib as the parent program giving the basic tools and techniques for all sorts of visualisation, where pyplot works on top of these tools to give an easier way of quickly creating charts.
we use the as to give these packages short names: pd, np and plt.
We can see what types of functions are available in each package by typing in the shortcut followed by a "." and then pressing tab. For example
pd. [+tab]
We'll start by loading in a data set of carbon emissions by country directly from my website. You could also load data from your local drive, preferable in the same folder, then the command below can be used. We will discuss more about loading in data in the next lab. What we can notice now however is that we used a Pandas command (pd.read_csv()) to load in the data (which is in csv format).
utslipp = pd.read_csv("https://jmaurit.github.io/anv_statistikk/data/carbonEmissions.csv", sep=";", decimal=",")
#utslipp = pd.read_csv("data/carbonEmissions.csv", sep=";", decimal=",")
The data has been loaded in a pandas object called utslipp. Think of this as an excel sheet with data, as well as the ability to apply a range of functions. We can
utslipp.head()
year | Canada | Mexico | US | Total North America | Argentina | Brazil | Chile | Colombia | Ecuador | ... | Sri Lanka | Taiwan | Thailand | Vietnam | Other Asia Pacific | Total Asia Pacific | Total World | OECD | Non-OECD | European Union | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1965 | 260.3 | 62.1 | 3480.1 | 3802.6 | 77.3 | 51.5 | 16.7 | 21.3 | 2.1 | ... | 1.5 | 18.8 | 7.4 | 8.2 | 52.6 | 1425.2 | 11207.7 | 7703.2 | 3504.6 | 3304.5 |
1 | 1966 | 271.7 | 65.1 | 3675.5 | 4012.2 | 79.6 | 56.2 | 17.9 | 23.1 | 2.2 | ... | 1.5 | 20.0 | 8.7 | 13.9 | 56.3 | 1548.3 | 11725.3 | 8015.8 | 3709.5 | 3344.1 |
2 | 1967 | 285.5 | 66.6 | 3772.6 | 4124.8 | 81.8 | 58.0 | 18.3 | 24.4 | 2.3 | ... | 1.5 | 21.8 | 9.9 | 19.0 | 61.7 | 1614.7 | 12084.7 | 8281.0 | 3803.8 | 3412.1 |
3 | 1968 | 308.3 | 72.2 | 3994.2 | 4374.7 | 84.2 | 68.1 | 18.7 | 25.2 | 2.8 | ... | 1.6 | 23.9 | 13.0 | 19.8 | 67.2 | 1720.4 | 12743.1 | 8804.5 | 3938.6 | 3613.6 |
4 | 1969 | 320.4 | 79.1 | 4170.1 | 4569.7 | 86.9 | 74.3 | 19.9 | 25.3 | 2.9 | ... | 1.6 | 25.2 | 14.1 | 23.3 | 71.9 | 1956.9 | 13530.9 | 9334.1 | 4196.8 | 3855.7 |
5 rows × 104 columns
You can igore the code below for the moment if you want - I am basically just formatting some of the settings for the plots. So If I want to make a series of charts with the same style, I only need to change the settings this one time in a series. The plotting will still work fine without doing anything here:
from cycler import cycler
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams["axes.labelsize"]= 12
plt.rcParams["figure.facecolor"] = "#f2f2f2"
#plt.rcParams['figure.savefig.dpi'] = 100
plt.rcParams['savefig.edgecolor'] = "#f2f2f2"
plt.rcParams['savefig.facecolor'] ="#f2f2f2"
plt.rcParams["figure.figsize"] = [16,10]
plt.rcParams['savefig.bbox'] = "tight"
plt.rcParams['font.size'] = 14
greens = ['#66c2a4','#41ae76','#238b45','#006d2c','#00441b']
multi =['#66c2a4','#1f78b4','#a6cee3','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f']
plt.rcParams["axes.prop_cycle"] = cycler(color=multi)
Now we can pot emissions in Sweden Denmark and Norway since 1960:
fig, ax = plt.subplots()
ax.plot(utslipp["year"], utslipp["Norway"])
ax.plot(utslipp["year"], utslipp["Sweden"])
ax.plot(utslipp["year"], utslipp["Denmark"])
plt.show()
It would be good to get some labels on this chart - both for the individual series labels, and for the y-axis
fig, ax = plt.subplots()
ax.plot(utslipp["year"], utslipp["Norway"])
ax.annotate("Norway", xy=(2000, 30))
ax.plot(utslipp["year"], utslipp["Sweden"])
ax.annotate("Sweden", xy=(1975, 80))
ax.plot(utslipp["year"], utslipp["Denmark"])
ax.annotate("Denmark", xy=(1975, 50))
#ax.plot(utslipp["year"], utslipp["European Union"])
#ax.annotate("Air and sea traffic, fishing", xy=(2010, 6000))
ax.set_ylabel("Greenhouse gas emissions, millions tonn C02 equivalent")
plt.show()