Pre-lab: Getting started with Python and Jupyter¶

MET 430 Mathematics II¶

NTNU Business School¶

Literature: PDS Ch 1 ¶

(Not everything is equally relevant or important in this chapter, so feel free to skim through quickly.)

Learning goals in this lab¶

Introducing IPython interpreter and Jupyter notebooks
Introducing markdown
Introduce the main Python packages used in mathematical analysis
How to get help with Python and mathematical analysis in Python

Why Python¶

You may have have heard of or used other programming languages or specialty programs for mathematical analysis and computation--Matlab, Octave, R and Julia to name a few. So why do we use Python in this course?

A few reasons:

The mathematical capabilities in Python and associated packages have grown quickly over the last years, and Python has become a standard in a lot of applied mathematical work.
Python, as a stand-alone programming language, is more flexible and in some ways more powerful specialty programs.
Within the growing field of Machine Learning and AI, usually associated with engineering departments and computer science (but increasingly business schools), and tech companies, Python is often the tool of choice.
As NTNU bachelor students, you have already had an introduction to Python, and if you continue with courses in statistics, data analysis, machine learning, or computational mathematics there is a high probability you will get more exposure to Python.

Finally, Python (like r and Julia) is open source. This has some obvious benefits: you do not need to pay licence fees to get full versions, nor fiddle around with under-powered student versions.

A more subtle advantage is that the open-source nature of the programs has helped establish some strong network effects. There are now thousands of people working on packages for Python and millions making use of those packages. This also leads to a quite robust network of courses, documentation and question-and-answer forums.

Can I use other languages/programs in this course if I want?¶

A Qualified Yes.

But then you accept that you may not be able to get the same degree of help, support and feedback from the instructor.

Main packages¶

Base Python is an excellent general programming language. But we need to make use of set of extra add-on packages to really get the most of Python for mathematical analysis. The main packages we will be using are:

Numpy for numerical calculations
Jupyter for creating "comutational documents" (these labs are created using Jupyter, and you will be working within Jupyter with your assignments and term project)
Matplotlib for plotting and visualisation.
scipy for scientific computing
Statsmodels for statistical routines and calculations

Luckily, you won't need to download and install these packages independently. They are all included in a Python-based platform for data science called Anaconda.

Installing Anaconda¶

You can download and install anaconda by going to this address:

https://www.anaconda.com/products/individual

You can download versions for Windows, Mac or Linux.

The download is free and with no restrictions on use.

If you have technical problems, you can consult the documentation:

https://docs.anaconda.com/anaconda/install/index.html

Opening a Jupyter notebook¶

In this course, we will be working within Jupyter notebooks to work through labs, completing assignments and creating the project assignment.

Jupyter notebooks run on a webbrowser and allows you to run commands for python

But it also allows you to write and format text and mathematical notation and then save the combination of text, code, and mathematical notation as a pdf or html file.

Jupyter is not always the best choice for doing analysis. For complex projects I often prefer to use a development environment, like Spyder, which also comes along with the Anaconda platform. But for our purposes, Jupyter functions very well.

Before opening up a jupyter notebook, you should create a folder on your computer where you will save your notebooks and other materials for the work, preferably on the local drive under your username. So for example on my macintosh I have created a folder under:

Macintosh HD/Users/johannesmauritzen/anv_statistikk

Exactly where doesn't matter, as long as you can easily find it. I would however avoid various cloud drives like icloud, onedrive, etc, since these can lead to problems.

Once you have installed anaconda and created a folder, you can open up jupyter notebooks in two ways.

If you are familiar with using the terminal/command line on your system, you can open up the terminal app, navigate to the folder you have created and type in the following command:
```
 jupyter notebook
```
Alternatively, you can find the Anaconda-Navigator navigator app installed on your machine. Then click on the launch button under the Jupyter Notebook icon. You can then navigate to the folder you created from within jupyter.

To start a new Jupyter notebook, click on the new button, then Python 3 in the drop-down menu

Finding your local disc and user folder¶

Some people experience problems finding their local disc and base user folder, which are important for finding your jupyter notebook files and loading data from your local disc. Here is video that walks through the process on a mac. The process for windows pc is similar.

Big problems with installing jupyter notebook?¶

Hopefully, installing Anaconda and getting a jupyter notebook up and running has not been a problem. But if you are running into problems which you can not easily fix, you might consider using Google's colab:

https://colab.research.google.com

With a google account, you can easily run python in the cloud. Google provides a user-friendly tutorial.

Working with ipython and a jupyter notebook¶

You should now have a pretty blank looking screen in front of you, with a single box with In written on the left, something like this:

In [ ]:

This is our computation/interpreter field. "Behind" this field is iPython, basically a version of the Python interpreter with some special tools especially designed for interactive work-flows.

If we wanted, we could use it as a simple calculator. We could ask, for example ask what 5+5 is:

In [ ]:

5+5

Out[ ]:

Here we run the computation by pressing the run button, or pressing shift + return

We can also run other python commands:

In [ ]:

print("Hello World!")

Hello World!

Or create an object, update that value and print it:

In [ ]:

myFavoriteNumber=21
myFavoriteNumber+=5
#same as myFavoriteNumber = myFavoriteNumber +5
print(myFavoriteNumber)

(Notice the use of # to comment)

Basically, most things you can do with Python, you can do in the iPython interpreter fields

Text, markdown and equations¶

You can also easily create formatted text and equations instead of code. While in a field, you can do one of two things:

go to the Cell menu -> Cell type -> *Markdown

But this will get annoying after doing it a few times.

Instead, learn the hotkey combo: cntrl m followed by a quick extra m

Notice now that the In [ ] label on the label is gone.

You are now ready to write text, and when you run the field, text will be printed, nicely formatted (next subsection)

Markdown¶

You can format your text using markdown notation. For example

# one hashtag followed by a space gives a large header¶

## two hashtage followed by a space give a smaller header¶

wrapping text with a "" makes it italics*
"**" Two stars makes bold text

You can see more with the markdown cheatsheet: https://jupyterbook.org/reference/cheatsheet.html

or here:

https://www.markdownguide.org/basic-syntax

In the text fields you can also write latex-style equations by wrapping the equation in \$

so

    $\sum_0^N x_i$

becomes

$\sum_0^N x_i$

Getting help¶

You will experience - quite often - not knowing how to do something or running into some error. There are a few quick ways of getting help.

One is accessing the build-in documentation with Python. We might wonder what the built-in command max does, for example. Then we can simply write:

In [ ]:

?max

This should lead to some documentation to pop-up below in your browser:

In [ ]:

max(5,4,2,7)

Out[ ]:

But sometimes you don't even know which command or set of commands you should use to do a certain task. This is something that happens constantly to even the most experienced programmer/analyst.

The solution is often to get on the net and start googling.

If you are using certain packages (like Pandas), you might also check their online documentation.

Exercise with data¶

Now we should be ready to get started with python. If you installed Anaconda, then the basic packages we need should be ready to go. All we need to do is to load them into memory. Here we will load in some data and plot it.

In [ ]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Now we have loaded in pandas, numby and and sub-package pyplot in matplotlib - you can think of matplotlib as the parent program giving the basic tools and techniques for all sorts of visualisation, where pyplot works on top of these tools to give an easier way of quickly creating charts.

we use the as to give these packages short names: pd, np and plt.

We can see what types of functions are available in each package by typing in the shortcut followed by a "." and then pressing tab. For example

    pd. [+tab]

We'll start by loading in a data set of carbon emissions by country directly from my website. You could also load data from your local drive, preferable in the same folder, then the command below can be used. We will discuss more about loading in data in the next lab. What we can notice now however is that we used a Pandas command (pd.read_csv()) to load in the data (which is in csv format).

In [ ]:

utslipp = pd.read_csv("https://jmaurit.github.io/anv_statistikk/data/carbonEmissions.csv", sep=";", decimal=",")
#utslipp = pd.read_csv("data/carbonEmissions.csv", sep=";", decimal=",")

The data has been loaded in a pandas object called utslipp. Think of this as an excel sheet with data, as well as the ability to apply a range of functions. We can

In [ ]:

utslipp.head()

Out[ ]:

	year	Canada	Mexico	US	Total North America	Argentina	Brazil	Chile	Colombia	Ecuador	...	Sri Lanka	Taiwan	Thailand	Vietnam	Other Asia Pacific	Total Asia Pacific	Total World	OECD	Non-OECD	European Union
0	1965	260.3	62.1	3480.1	3802.6	77.3	51.5	16.7	21.3	2.1	...	1.5	18.8	7.4	8.2	52.6	1425.2	11207.7	7703.2	3504.6	3304.5
1	1966	271.7	65.1	3675.5	4012.2	79.6	56.2	17.9	23.1	2.2	...	1.5	20.0	8.7	13.9	56.3	1548.3	11725.3	8015.8	3709.5	3344.1
2	1967	285.5	66.6	3772.6	4124.8	81.8	58.0	18.3	24.4	2.3	...	1.5	21.8	9.9	19.0	61.7	1614.7	12084.7	8281.0	3803.8	3412.1
3	1968	308.3	72.2	3994.2	4374.7	84.2	68.1	18.7	25.2	2.8	...	1.6	23.9	13.0	19.8	67.2	1720.4	12743.1	8804.5	3938.6	3613.6
4	1969	320.4	79.1	4170.1	4569.7	86.9	74.3	19.9	25.3	2.9	...	1.6	25.2	14.1	23.3	71.9	1956.9	13530.9	9334.1	4196.8	3855.7

5 rows × 104 columns

You can igore the code below for the moment if you want - I am basically just formatting some of the settings for the plots. So If I want to make a series of charts with the same style, I only need to change the settings this one time in a series. The plotting will still work fine without doing anything here:

In [ ]:

from cycler import cycler

plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams["axes.labelsize"]= 12
plt.rcParams["figure.facecolor"] = "#f2f2f2"
#plt.rcParams['figure.savefig.dpi'] = 100
plt.rcParams['savefig.edgecolor'] = "#f2f2f2"
plt.rcParams['savefig.facecolor'] ="#f2f2f2"
plt.rcParams["figure.figsize"] = [16,10]
plt.rcParams['savefig.bbox'] = "tight"
plt.rcParams['font.size'] = 14
greens = ['#66c2a4','#41ae76','#238b45','#006d2c','#00441b']
multi =['#66c2a4','#1f78b4','#a6cee3','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f']
plt.rcParams["axes.prop_cycle"] = cycler(color=multi)

Now we can pot emissions in Sweden Denmark and Norway since 1960:

In [ ]:

fig, ax = plt.subplots()
ax.plot(utslipp["year"], utslipp["Norway"])
ax.plot(utslipp["year"], utslipp["Sweden"])
ax.plot(utslipp["year"], utslipp["Denmark"])
plt.show()

It would be good to get some labels on this chart - both for the individual series labels, and for the y-axis

In [ ]:

fig, ax = plt.subplots()
ax.plot(utslipp["year"], utslipp["Norway"])
ax.annotate("Norway", xy=(2000, 30))
ax.plot(utslipp["year"], utslipp["Sweden"])
ax.annotate("Sweden", xy=(1975, 80))
ax.plot(utslipp["year"], utslipp["Denmark"])
ax.annotate("Denmark", xy=(1975, 50))
#ax.plot(utslipp["year"], utslipp["European Union"])
#ax.annotate("Air and sea traffic, fishing", xy=(2010, 6000))
ax.set_ylabel("Greenhouse gas emissions, millions tonn C02 equivalent")
plt.show()

Pre-lab: Getting started with Python and Jupyter¶

MET 430 Mathematics II¶

NTNU Business School¶

Literature: PDS Ch 1¶

Learning goals in this lab¶

Why Python¶

Can I use other languages/programs in this course if I want?¶

Main packages¶

Installing Anaconda¶

Opening a Jupyter notebook¶

Finding your local disc and user folder¶

Big problems with installing jupyter notebook?¶

Working with ipython and a jupyter notebook¶

Text, markdown and equations¶

Markdown¶

# one hashtag followed by a space gives a large header¶

## two hashtage followed by a space give a smaller header¶

Getting help¶

Exercise with data¶

Literature: PDS Ch 1 ¶