Applied Statistics: Python Labs

Johannes Mauritzen

About Applied Statistics: Python Labs

I created the following 14 labs for a course in applied statistics at the NTNU Business School. This is a third-year course in the bachelor degree, and students have already had an introductory course in statistics and probability, and I assume some prior knowledge.

But the goal of this course is to prepare students for doing all phases of the data analysis process: From importing, transforming and merging data, doing a descriptive data analysis and creating visualisations, running and interpreting regression models, and checking the validity of a model. The first four labs primarily focus on data-handling and computing. In these labs, I partially follow Jake Vanderplas's Python Data Science Handbook [PDS].

I use a simulation approach to understanding and estimating statistics. I heavily rely on the excellent Regression and Other Stories [ROS] by Gelman, Hill and Vehtari. With the simulation approach and treating estimated parameters as being uncertain, I am implicitly introducing a Bayesian approach to regression modelling (though the actual estimation is traditional ordinary least squares and maximum likelihood). I also follow ROS in de-emphasizing hypothesis testing.

Time-series techniques are often left out of introductory courses in statistics. But these are important, especially for economics and business students. The last two labs provide an essential introduction to time series and ARIMA modelling. Here I rely on the online book by Hyndman and Athanasopoulos. Hyndman and Athanasopoulos use r and some custom packages, thus I have translated some examples into Python and the tools available in the package statsmodels.

Finally, I make use of some examples from Alan Downey's Think Stats and Think Bayes online books.

Bachelor students at NTNU have also taken an introductory course in computer science using Python, which is the main reason I choose Python over R in this course. I still provide a light introduction in Python for beginners, and walk through how to download the Anaconda package and open a jupyter notebook, which is both how the labs were created and my recommended tool for working through the labs.

You can cite these labs as:

Mauritzen, Johannes (2022). Applied Statistics: Python Labs. Last accessed (date)

ISBN 978-82-693160-0-1

Introducing Python

Part I: Data Management and Descriptive Statistics

Part II: Probability and Regression Through Simulation

Part III: Extensions

Part IV: How-To's. Short notebooks on common tasks