We start this analysis like we start almost every analysis: by loading in the packages we want to use and giving them short-names. Here we load in both numpy (as np), which gives us some basic data-types to use. We also will import matplotlib for plotting tools.
import numpy as np
import matplotlib.pyplot as plt
Data, in the broadest sense, is just a collection of information objects. Often times it is numbers, like stock prices, but not always. You can have categories, names, or written text.
The standard Python object for storing multiple data points is a list. Lists are super easy, just a collection of entries put between square brackets [ ]:
Below we will make two lists. The first consists of electricity price areas that trade on the Nord Pool electricity markets (this includes all of the Nordic countries except Iceland. So for example SE1 is the southernmost part of Sweden, where Malmö is the largest city. The Norwegian price areas (6) are described by each area's biggest city.
Then I create a list of integers from 1 to 17 indicating the first 17 days of 2022
areas = ["SYS", "SE1", "SE2", "SE3", "SE4", "FI", "DK1", "DK2", "Oslo"," Kr.sand", "Bergen", "Molde", "Tr.heim", "Tromsø", "EE", "LV", "LT", "AT", "BE", "DE-LU", "FR", "NL"]
#first 17 days in january
dates = [i for i in range(1,18)]
print(areas, dates)
['SYS', 'SE1', 'SE2', 'SE3', 'SE4', 'FI', 'DK1', 'DK2', 'Oslo', ' Kr.sand', 'Bergen', 'Molde', 'Tr.heim', 'Tromsø', 'EE', 'LV', 'LT', 'AT', 'BE', 'DE-LU', 'FR', 'NL'] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
Now that we have these data in lists we can do all sorts of fun programming things with them - often involving iterating through them with a for loop:
for area in areas:
print("My favorite price area is", area)
My favorite price area is SYS My favorite price area is SE1 My favorite price area is SE2 My favorite price area is SE3 My favorite price area is SE4 My favorite price area is FI My favorite price area is DK1 My favorite price area is DK2 My favorite price area is Oslo My favorite price area is Kr.sand My favorite price area is Bergen My favorite price area is Molde My favorite price area is Tr.heim My favorite price area is Tromsø My favorite price area is EE My favorite price area is LV My favorite price area is LT My favorite price area is AT My favorite price area is BE My favorite price area is DE-LU My favorite price area is FR My favorite price area is NL
We can use indexing to pull out a single entry from our list (remember that Python indexing always starts with 0 - confusingly R indexing starts with 1).
areas[2]
'SE2'
Lists can include any data types as well as any combination of types:
mixList = [1, "Cats", 3.14, "Dogs"]
You can also have lists of objects or functions
def convert_to_dog_years(human_years):
return(human_years/7)
def convert_to_cat_years(human_years):
return(human_years/6)
funcList = [convert_to_dog_years, convert_to_cat_years]
So then I can use a for-loop to loop through the functions and apply each in turn:
for func in funcList:
print(func(40))
5.714285714285714 6.666666666666667
The flexibility of lists is nice as a programming tool, but it becomes inefficient when trying to work with data. This is where Numpy array comes in. In a numpy array you are limited to one type of data in a given array: integers, real/floats, characters, etc.
np.array([1,2,5])
array([1, 2, 5])
Numpy includes special functions to create arrays form scratch. For example to create an array of 10 0's you would simply write:
np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Two such function we will make use of often in this course when we generate fake data is np.linspace and np.random.
np.linspace(0,10,100)
array([ 0. , 0.1010101 , 0.2020202 , 0.3030303 , 0.4040404 , 0.50505051, 0.60606061, 0.70707071, 0.80808081, 0.90909091, 1.01010101, 1.11111111, 1.21212121, 1.31313131, 1.41414141, 1.51515152, 1.61616162, 1.71717172, 1.81818182, 1.91919192, 2.02020202, 2.12121212, 2.22222222, 2.32323232, 2.42424242, 2.52525253, 2.62626263, 2.72727273, 2.82828283, 2.92929293, 3.03030303, 3.13131313, 3.23232323, 3.33333333, 3.43434343, 3.53535354, 3.63636364, 3.73737374, 3.83838384, 3.93939394, 4.04040404, 4.14141414, 4.24242424, 4.34343434, 4.44444444, 4.54545455, 4.64646465, 4.74747475, 4.84848485, 4.94949495, 5.05050505, 5.15151515, 5.25252525, 5.35353535, 5.45454545, 5.55555556, 5.65656566, 5.75757576, 5.85858586, 5.95959596, 6.06060606, 6.16161616, 6.26262626, 6.36363636, 6.46464646, 6.56565657, 6.66666667, 6.76767677, 6.86868687, 6.96969697, 7.07070707, 7.17171717, 7.27272727, 7.37373737, 7.47474747, 7.57575758, 7.67676768, 7.77777778, 7.87878788, 7.97979798, 8.08080808, 8.18181818, 8.28282828, 8.38383838, 8.48484848, 8.58585859, 8.68686869, 8.78787879, 8.88888889, 8.98989899, 9.09090909, 9.19191919, 9.29292929, 9.39393939, 9.49494949, 9.5959596 , 9.6969697 , 9.7979798 , 9.8989899 , 10. ])
Here we have created an array with 100 entries evently spaced between 1 and 10.
np.random.normal(0,1,100)
array([-5.85188441e-01, 6.52532715e-01, 1.04734210e+00, -1.61826504e+00, -1.64357286e+00, 9.47162956e-02, -8.26588661e-01, -4.75864621e-02, 3.75905482e-01, 5.78914035e-01, -1.34099015e-01, 9.47292804e-01, 1.96788280e+00, -1.56122722e+00, 1.07456098e+00, -1.72671056e-01, -1.24136175e+00, -1.57555541e+00, -1.77597973e+00, -1.98034390e-01, -6.36080909e-02, -3.94085718e-01, -1.46336465e+00, -1.03808406e+00, -1.44090258e+00, -4.41776690e-01, -1.10150245e+00, -2.82165528e-01, -1.58951726e+00, -3.12862695e-01, -1.39090606e+00, -5.17333255e-01, 9.48289761e-01, -9.51649056e-01, -6.28005596e-01, 2.44556852e-01, 2.18233902e+00, -8.54715035e-01, -8.72975087e-01, -1.52732663e+00, 2.98879123e-02, 6.81811509e-01, -1.98722814e+00, -5.19292377e-01, 1.21913645e+00, -1.19260914e+00, -5.02168284e-01, 5.09487254e-01, 1.04283897e+00, 1.30459210e-01, 4.39045295e-01, -1.76311090e-01, 9.18261369e-01, -5.45596773e-01, 1.30719057e+00, -1.33353153e+00, 1.96787732e-01, -7.26120770e-01, -7.51513844e-01, 2.95118320e+00, 1.85661289e-01, -1.10900932e+00, -5.77134371e-01, 2.88859949e-01, 7.55932762e-01, 2.56777770e-01, -3.45768030e-01, 2.35853552e-01, 5.85196499e-01, -2.54575433e+00, 8.70288740e-01, 1.31679022e+00, 3.76704615e-01, -3.31644171e-01, 8.96960910e-01, 4.39117165e-01, 1.52836393e-01, 1.19588306e-01, 2.10534185e+00, 1.45764607e+00, -2.35390431e+00, -6.86557153e-02, 6.33613633e-01, 3.31735208e-01, -1.47373613e+00, -5.48864617e-01, -1.46942554e-01, 1.64115674e+00, 2.45992377e-03, -1.41412295e+00, 6.75161197e-01, -8.37819560e-01, 1.17116684e+00, 7.12763478e-03, -1.67876758e-01, -9.73283874e-01, 2.64936043e-01, -1.34616168e+00, 9.51352288e-02, 1.19564079e+00])
This generated an array with 100 random numbers drawn from a normal distribution
For more examples of NP functions used to create arrays see PDS
Going back to our electricity price example, we can easily convert our areas and dates lists to np.arrays
areas = np.array(areas)
dates = np.array(dates)
print(areas, dates)
['SYS' 'SE1' 'SE2' 'SE3' 'SE4' 'FI' 'DK1' 'DK2' 'Oslo' ' Kr.sand' 'Bergen' 'Molde' 'Tr.heim' 'Tromsø' 'EE' 'LV' 'LT' 'AT' 'BE' 'DE-LU' 'FR' 'NL'] [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
NP arrays can often be in multiple dimensions. We are maybe most familiar with a 2-dimensional array (like an excel sheet), that holds columns and rows of data.
Here we are going to import data on electricity prices from a text file (comma-seperated file) into a NP array
elprices = np.genfromtxt('https://jmaurit.github.io/anv_statistikk/data/elspot_prices.csv', delimiter=';')
elprices
array([[ 762., 465., 465., 821., 821., 819., 823., 821., 1270., 1270., 1270., 418., 418., 418., 824., 824., 824.], [ 575., 438., 438., 533., 533., 573., 557., 533., 1143., 1143., 1143., 341., 341., 341., 577., 577., 577.], [ 811., 394., 394., 735., 735., 804., 782., 718., 1292., 1292., 1292., 355., 355., 355., 918., 918., 918.], [1161., 450., 450., 1243., 1331., 1243., 1411., 1331., 1474., 1474., 1474., 441., 441., 441., 1457., 1457., 1464.], [1192., 516., 516., 1087., 1087., 1218., 1088., 1087., 1336., 1336., 1336., 516., 516., 512., 1228., 1228., 1228.], [1443., 527., 527., 1701., 1709., 1696., 1879., 1801., 1859., 1859., 1559., 527., 527., 510., 1711., 1711., 1724.], [1424., 528., 528., 1585., 1585., 1672., 1585., 1585., 1594., 1594., 1513., 503., 503., 503., 1672., 1672., 1672.], [1369., 500., 500., 1462., 1462., 1604., 1461., 1462., 1458., 1458., 1458., 455., 455., 453., 1604., 1604., 1604.], [1237., 541., 541., 1317., 1317., 1460., 1317., 1317., 1427., 1427., 1427., 427., 427., 427., 1460., 1460., 1460.], [1704., 507., 507., 2471., 2471., 2491., 2449., 2471., 1940., 1940., 1639., 479., 479., 479., 2491., 2491., 2491.], [1337., 354., 354., 1629., 2245., 1629., 1561., 2245., 1515., 1515., 1514., 304., 304., 278., 1948., 2113., 2450.], [ 967., 215., 215., 864., 1512., 800., 1296., 1512., 1386., 1386., 1386., 202., 202., 202., 1819., 1921., 2097.], [ 337., 176., 176., 184., 184., 186., 184., 184., 1264., 1264., 1264., 176., 176., 176., 1525., 1525., 1525.], [ 596., 181., 181., 549., 549., 601., 874., 752., 1289., 1289., 1289., 181., 181., 181., 991., 991., 991.], [1128., 193., 193., 1246., 1261., 933., 1694., 1694., 1551., 1551., 1467., 193., 193., 193., 1445., 1445., 1445.], [ 288., 174., 174., 184., 184., 184., 953., 409., 1277., 1277., 1277., 174., 174., 174., 1047., 1047., 1047.], [ 716., 172., 172., 810., 810., 801., 833., 810., 1323., 1323., 1323., 172., 172., 172., 1047., 1047., 1047.], [1170., 178., 178., 1446., 1481., 1446., 1619., 1518., 1609., 1609., 1490., 178., 178., 178., 1686., 1740., 1740.], [ 458., 152., 152., 338., 338., 625., 524., 338., 1353., 1353., 1353., 152., 152., 152., 1509., 1509., 1509.]])
In this data, each row is the daily price series for a given area that trades on the Nord Pool market. The columns represent the days
If we want to access a specific value, then we can use the index in the form of
myArray[row, column]
elprices[3,4]
1331.0
1331 is the value on the 4th row and 5th column (remember, indexing starts at 0!)
If we want the whole 4th row, we can write:
elprices[3,:]
array([1161., 450., 450., 1243., 1331., 1243., 1411., 1331., 1474., 1474., 1474., 441., 441., 441., 1457., 1457., 1464.])
Or the 4th row between columns 5 and 8
elprices[3,5:9]
array([1243., 1411., 1331., 1474.])
Notice the convention for slicing 5:9 - the 5th element is included, but the slice goes upto but does not include the 9th element. That will say we get elements 5,6,7 and 8 by slicing 5:9
So lets say I want to see the series for prices in Trondheim. Then I can make use of the array we created earlier over all the price areas which we called areas, and then I use the np.where command:
np.where(areas=='Tr.heim')
(array([12]),)
np.where tells us which index is associated with the 'Tr.heim' entry. Our price series for the Trondheim area is then:
elprices[12,:]
array([ 337., 176., 176., 184., 184., 186., 184., 184., 1264., 1264., 1264., 176., 176., 176., 1525., 1525., 1525.])
We could do this in one line:
elprices[np.where(areas=='Tr.heim'),:]
array([[[ 337., 176., 176., 184., 184., 186., 184., 184., 1264., 1264., 1264., 176., 176., 176., 1525., 1525., 1525.]]])
We could compare this to our Oslo data
elprices[np.where(areas=='Oslo'),:]
array([[[1237., 541., 541., 1317., 1317., 1460., 1317., 1317., 1427., 1427., 1427., 427., 427., 427., 1460., 1460., 1460.]]])
We can see that prices have been quite a bit lower in Trondheim compared to Oslo in the first days of 2022, but it would help to get a chart. More on charting in a later lab, but here is a quick and dirty chart we will make now:
plt.plot(dates, elprices[8, :], "r", elprices[12, :], "b")
[<matplotlib.lines.Line2D at 0x7ff428089190>, <matplotlib.lines.Line2D at 0x7ff428097130>]
Prices were much higher in Oslo over the first days in the year, but converged towards the middle of the month. (We will learn how to make much prettier charts in the next couple of labs).
NP arrays make it easy to do computations on the arrays.
Doing computation and transformation on an entire array, rather than, for example, on individual elements in a for-loop, also speeds up computation significantly.
This form of doing computation on arrays is called vectorization
Here is a simple example:
Our data on electricity prices is in terms of NOK/MWH, where NOK is norwegian kroners, and MWH is Megawatt-hours.
Most people get their electricity bills in KWH (Kilowatthour, which is 1/1000 of a MWH, so it can be helpful to divide all elements by 1000:
elprices/1000
array([[0.762, 0.465, 0.465, 0.821, 0.821, 0.819, 0.823, 0.821, 1.27 , 1.27 , 1.27 , 0.418, 0.418, 0.418, 0.824, 0.824, 0.824], [0.575, 0.438, 0.438, 0.533, 0.533, 0.573, 0.557, 0.533, 1.143, 1.143, 1.143, 0.341, 0.341, 0.341, 0.577, 0.577, 0.577], [0.811, 0.394, 0.394, 0.735, 0.735, 0.804, 0.782, 0.718, 1.292, 1.292, 1.292, 0.355, 0.355, 0.355, 0.918, 0.918, 0.918], [1.161, 0.45 , 0.45 , 1.243, 1.331, 1.243, 1.411, 1.331, 1.474, 1.474, 1.474, 0.441, 0.441, 0.441, 1.457, 1.457, 1.464], [1.192, 0.516, 0.516, 1.087, 1.087, 1.218, 1.088, 1.087, 1.336, 1.336, 1.336, 0.516, 0.516, 0.512, 1.228, 1.228, 1.228], [1.443, 0.527, 0.527, 1.701, 1.709, 1.696, 1.879, 1.801, 1.859, 1.859, 1.559, 0.527, 0.527, 0.51 , 1.711, 1.711, 1.724], [1.424, 0.528, 0.528, 1.585, 1.585, 1.672, 1.585, 1.585, 1.594, 1.594, 1.513, 0.503, 0.503, 0.503, 1.672, 1.672, 1.672], [1.369, 0.5 , 0.5 , 1.462, 1.462, 1.604, 1.461, 1.462, 1.458, 1.458, 1.458, 0.455, 0.455, 0.453, 1.604, 1.604, 1.604], [1.237, 0.541, 0.541, 1.317, 1.317, 1.46 , 1.317, 1.317, 1.427, 1.427, 1.427, 0.427, 0.427, 0.427, 1.46 , 1.46 , 1.46 ], [1.704, 0.507, 0.507, 2.471, 2.471, 2.491, 2.449, 2.471, 1.94 , 1.94 , 1.639, 0.479, 0.479, 0.479, 2.491, 2.491, 2.491], [1.337, 0.354, 0.354, 1.629, 2.245, 1.629, 1.561, 2.245, 1.515, 1.515, 1.514, 0.304, 0.304, 0.278, 1.948, 2.113, 2.45 ], [0.967, 0.215, 0.215, 0.864, 1.512, 0.8 , 1.296, 1.512, 1.386, 1.386, 1.386, 0.202, 0.202, 0.202, 1.819, 1.921, 2.097], [0.337, 0.176, 0.176, 0.184, 0.184, 0.186, 0.184, 0.184, 1.264, 1.264, 1.264, 0.176, 0.176, 0.176, 1.525, 1.525, 1.525], [0.596, 0.181, 0.181, 0.549, 0.549, 0.601, 0.874, 0.752, 1.289, 1.289, 1.289, 0.181, 0.181, 0.181, 0.991, 0.991, 0.991], [1.128, 0.193, 0.193, 1.246, 1.261, 0.933, 1.694, 1.694, 1.551, 1.551, 1.467, 0.193, 0.193, 0.193, 1.445, 1.445, 1.445], [0.288, 0.174, 0.174, 0.184, 0.184, 0.184, 0.953, 0.409, 1.277, 1.277, 1.277, 0.174, 0.174, 0.174, 1.047, 1.047, 1.047], [0.716, 0.172, 0.172, 0.81 , 0.81 , 0.801, 0.833, 0.81 , 1.323, 1.323, 1.323, 0.172, 0.172, 0.172, 1.047, 1.047, 1.047], [1.17 , 0.178, 0.178, 1.446, 1.481, 1.446, 1.619, 1.518, 1.609, 1.609, 1.49 , 0.178, 0.178, 0.178, 1.686, 1.74 , 1.74 ], [0.458, 0.152, 0.152, 0.338, 0.338, 0.625, 0.524, 0.338, 1.353, 1.353, 1.353, 0.152, 0.152, 0.152, 1.509, 1.509, 1.509]])
Perhaps we wanted to add a certain fixed fee - lets say 100kr- for transmission to each price, then we could just calculate
elprices+100
array([[ 862., 565., 565., 921., 921., 919., 923., 921., 1370., 1370., 1370., 518., 518., 518., 924., 924., 924.], [ 675., 538., 538., 633., 633., 673., 657., 633., 1243., 1243., 1243., 441., 441., 441., 677., 677., 677.], [ 911., 494., 494., 835., 835., 904., 882., 818., 1392., 1392., 1392., 455., 455., 455., 1018., 1018., 1018.], [1261., 550., 550., 1343., 1431., 1343., 1511., 1431., 1574., 1574., 1574., 541., 541., 541., 1557., 1557., 1564.], [1292., 616., 616., 1187., 1187., 1318., 1188., 1187., 1436., 1436., 1436., 616., 616., 612., 1328., 1328., 1328.], [1543., 627., 627., 1801., 1809., 1796., 1979., 1901., 1959., 1959., 1659., 627., 627., 610., 1811., 1811., 1824.], [1524., 628., 628., 1685., 1685., 1772., 1685., 1685., 1694., 1694., 1613., 603., 603., 603., 1772., 1772., 1772.], [1469., 600., 600., 1562., 1562., 1704., 1561., 1562., 1558., 1558., 1558., 555., 555., 553., 1704., 1704., 1704.], [1337., 641., 641., 1417., 1417., 1560., 1417., 1417., 1527., 1527., 1527., 527., 527., 527., 1560., 1560., 1560.], [1804., 607., 607., 2571., 2571., 2591., 2549., 2571., 2040., 2040., 1739., 579., 579., 579., 2591., 2591., 2591.], [1437., 454., 454., 1729., 2345., 1729., 1661., 2345., 1615., 1615., 1614., 404., 404., 378., 2048., 2213., 2550.], [1067., 315., 315., 964., 1612., 900., 1396., 1612., 1486., 1486., 1486., 302., 302., 302., 1919., 2021., 2197.], [ 437., 276., 276., 284., 284., 286., 284., 284., 1364., 1364., 1364., 276., 276., 276., 1625., 1625., 1625.], [ 696., 281., 281., 649., 649., 701., 974., 852., 1389., 1389., 1389., 281., 281., 281., 1091., 1091., 1091.], [1228., 293., 293., 1346., 1361., 1033., 1794., 1794., 1651., 1651., 1567., 293., 293., 293., 1545., 1545., 1545.], [ 388., 274., 274., 284., 284., 284., 1053., 509., 1377., 1377., 1377., 274., 274., 274., 1147., 1147., 1147.], [ 816., 272., 272., 910., 910., 901., 933., 910., 1423., 1423., 1423., 272., 272., 272., 1147., 1147., 1147.], [1270., 278., 278., 1546., 1581., 1546., 1719., 1618., 1709., 1709., 1590., 278., 278., 278., 1786., 1840., 1840.], [ 558., 252., 252., 438., 438., 725., 624., 438., 1453., 1453., 1453., 252., 252., 252., 1609., 1609., 1609.]])
Logarithms are a function we will work with quite a bit in this course, to transform to a logarithm we would simply write:
np.log(elprices)
array([[6.63594656, 6.14203741, 6.14203741, 6.71052311, 6.71052311, 6.70808408, 6.7129562 , 6.71052311, 7.14677218, 7.14677218, 7.14677218, 6.03548143, 6.03548143, 6.03548143, 6.71417053, 6.71417053, 6.71417053], [6.35437004, 6.08221891, 6.08221891, 6.27852142, 6.27852142, 6.35088572, 6.32256524, 6.27852142, 7.04141166, 7.04141166, 7.04141166, 5.83188248, 5.83188248, 5.83188248, 6.35784227, 6.35784227, 6.35784227], [6.69826805, 5.97635091, 5.97635091, 6.5998705 , 6.5998705 , 6.68959927, 6.66185474, 6.57646957, 7.16394668, 7.16394668, 7.16394668, 5.87211779, 5.87211779, 5.87211779, 6.82219739, 6.82219739, 6.82219739], [7.05703698, 6.10924758, 6.10924758, 7.12528309, 7.19368582, 7.12528309, 7.25205395, 7.19368582, 7.29573507, 7.29573507, 7.29573507, 6.08904488, 6.08904488, 6.08904488, 7.28413481, 7.28413481, 7.28892769], [7.08338785, 6.24610677, 6.24610677, 6.99117689, 6.99117689, 7.10496545, 6.99209643, 6.99117689, 7.19743535, 7.19743535, 7.19743535, 6.24610677, 6.24610677, 6.23832463, 7.11314211, 7.11314211, 7.11314211], [7.27447956, 6.26720055, 6.26720055, 7.43897159, 7.44366368, 7.43602782, 7.538495 , 7.49609735, 7.52779399, 7.52779399, 7.35179987, 6.26720055, 6.26720055, 6.23441073, 7.44483327, 7.44483327, 7.45240245], [7.26122509, 6.26909628, 6.26909628, 7.36833969, 7.36833969, 7.42177579, 7.36833969, 7.36833969, 7.37400186, 7.37400186, 7.32184971, 6.22059017, 6.22059017, 6.22059017, 7.42177579, 7.42177579, 7.42177579], [7.22183583, 6.2146081 , 6.2146081 , 7.28756064, 7.28756064, 7.38025579, 7.28687641, 7.28756064, 7.28482091, 7.28482091, 7.28482091, 6.12029742, 6.12029742, 6.11589213, 7.38025579, 7.38025579, 7.38025579], [7.12044437, 6.29341928, 6.29341928, 7.1831117 , 7.1831117 , 7.28619171, 7.1831117 , 7.1831117 , 7.26332962, 7.26332962, 7.26332962, 6.05678401, 6.05678401, 6.05678401, 7.28619171, 7.28619171, 7.28619171], [7.44073371, 6.228511 , 6.228511 , 7.81237821, 7.81237821, 7.82043952, 7.80343506, 7.81237821, 7.57044325, 7.57044325, 7.40184158, 6.1717006 , 6.1717006 , 6.1717006 , 7.82043952, 7.82043952, 7.82043952], [7.19818358, 5.86929691, 5.86929691, 7.39572161, 7.7164608 , 7.39572161, 7.35308192, 7.7164608 , 7.32317072, 7.32317072, 7.32251043, 5.7170277 , 5.7170277 , 5.62762111, 7.57455848, 7.65586402, 7.8038433 ], [6.8741985 , 5.37063803, 5.37063803, 6.76157277, 7.32118856, 6.68461173, 7.16703788, 7.32118856, 7.23417718, 7.23417718, 7.23417718, 5.3082677 , 5.3082677 , 5.3082677 , 7.50604218, 7.56060116, 7.64826303], [5.82008293, 5.170484 , 5.170484 , 5.21493576, 5.21493576, 5.22574667, 5.21493576, 5.21493576, 7.14203657, 7.14203657, 7.14203657, 5.170484 , 5.170484 , 5.170484 , 7.32974969, 7.32974969, 7.32974969], [6.39024067, 5.19849703, 5.19849703, 6.30809844, 6.30809844, 6.39859493, 6.77308038, 6.62273632, 7.161622 , 7.161622 , 7.161622 , 5.19849703, 5.19849703, 5.19849703, 6.89871453, 6.89871453, 6.89871453], [7.02820143, 5.26269019, 5.26269019, 7.1276937 , 7.13966034, 6.8384052 , 7.43484788, 7.43484788, 7.34665516, 7.34665516, 7.29097478, 5.26269019, 5.26269019, 5.26269019, 7.2758646 , 7.2758646 , 7.2758646 ], [5.66296048, 5.1590553 , 5.1590553 , 5.21493576, 5.21493576, 5.21493576, 6.8596149 , 6.01371516, 7.15226886, 7.15226886, 7.15226886, 5.1590553 , 5.1590553 , 5.1590553 , 6.95368421, 6.95368421, 6.95368421], [6.57368017, 5.14749448, 5.14749448, 6.69703425, 6.69703425, 6.68586095, 6.72503364, 6.69703425, 7.18765716, 7.18765716, 7.18765716, 5.14749448, 5.14749448, 5.14749448, 6.95368421, 6.95368421, 6.95368421], [7.06475903, 5.18178355, 5.18178355, 7.2765564 , 7.30047281, 7.2765564 , 7.38956395, 7.32514896, 7.38336815, 7.38336815, 7.3065314 , 5.18178355, 5.18178355, 5.18178355, 7.43011414, 7.46164039, 7.46164039], [6.12686918, 5.02388052, 5.02388052, 5.8230459 , 5.8230459 , 6.43775165, 6.26149168, 5.8230459 , 7.21007963, 7.21007963, 7.21007963, 5.02388052, 5.02388052, 5.02388052, 7.31920246, 7.31920246, 7.31920246]])
For more operations that can be applied to an array--called ufunctions, see PDS
Numpy allows us to easily compute summary statistics for our data.
We could, for example, find the highest price in our data:
elprices.max()
2491.0
Or the mean price for Trondheim vs. mean price in Oslo
print("Trondheim mean ", elprices[12,:].mean())
print("Oslo mean ", elprices[8,:].mean())
Trondheim mean 618.0 Oslo mean 1117.0
Or the mean across all the areas
elprices.mean(axis=1)
array([ 784.29411765, 609.58823529, 768.70588235, 1102.52941176, 1001.58823529, 1368.82352941, 1277.52941176, 1198.17647059, 1117. , 1735.29411765, 1370.29411765, 1057.76470588, 618. , 686.23529412, 1048.52941176, 590.82352941, 750. , 1143.76470588, 703.94117647])
In this formula axis=1 takes the mean across all columns, while axis=0 would take the mean across all rows
Standard deviations are also something we will work with, and this we can also easily compute
elprices.std()
602.862199002882
For more built-in summary statistics that you can compute on NP arrays, see PDS
How many days in Trondheim did we see prices above 1000kr/MWH?
To calculate this, we first need to determine whether a given entry is or is not above 1000 - basically a True/False test. This is how we do it:
elprices[12,:]>1000
array([False, False, False, False, False, False, False, False, True, True, True, False, False, False, True, True, True])
The more-than sign, >, is what we refer to as boolean operator, it will return either TRUE or FALSE.
Another common boolean operator is X==Y which asks is X equal to Y?
Here is an error I do all the time: I want to ask is X equal to Y, but I write X=Y, which in Python signifies that an object X is now equal to the value in Y.
For my price example, I could count the number of "True" responses to get the number of days with prices over 1000 kr/MWH, but a better way to do it would be:
np.sum(elprices[12,:]>1000)
6
Here I use the np.sum function, which sums up all the values in the array. The way Python works is that a True counts as 1 and False counts as 0.
Did any price areas experience a price over 2000 kr/MWH?
We can answer that with:
np.any(elprices>2000)
True
To read more about comparisons in Numpy arrays, see PDS
1.) Generating arrays - random order generator. Make an NP array list that includes the names of your immediate family as well as uncles and aunts and cousins. Now you are all staying at a small cabin with a single bathroom and shower and you need a fair way of randomly ordering who will get to use the bathroom/shower during the morning. So you offer to create a random ordering. Write a program that creates a random ordering of the names in the array. (Hint: you should probably use the function np.random.choice()
2.) Generating random values
a. Generate 100 random values from a uniform distribution (you can choose the range yourself, 0-1, 0-10, whatever). Plot the values in a histogram
b. Now generate a 100x100 2-d array of random values from a uniform distribution.
c. Take the mean across each column and store this in a 1-d array. Plot a histogram of this array. Does it look like the array in a?
3.) Working with a small data set
a. Find a small data set on a website somewhere and import it as a numpy array. You can either copy and paste the data directly into Jupyter, or copy it into a txt document then read it in as we did above. (For those who really want a challenge, you could experiment with using a web-scraping package like [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) to import data directly from a website.)
b. Create a simple plot of the data - it doesn't need to be pretty.