Intro to numpy#
Review and Outline#
Great Work! We have made it this far…we know some basic calculations, built-in data types and structures (lists, tuples, strings, dictionaries), we also know some key operations if else conditional operations, for loops, etc.
Where are we going know…we will get into the key scientific computing packages in python: numpy.
What is numpy, a short for Numerical Python. It can be used for high performance computing and data analysis.
Efficiency: it provides the most efficient data structure in python:
ndarrayfor this type of computing. Imagine when you need to conduct calculations on more than 200k rows with 10k columns over and over again.Data analysis: though itself does not provide very high-level data analytical function as
pandas, having an understanding of it will help us use tools in pandas with less pain.
Python#
First we need to import the numpy package.
Then we will learn the key data structures in numpy and their attributes and methods. Moreover, we will learn how to select data in ndarray and then do computations afterwards.
Buzzwords. NdArray
Basics#
This says import the package numpy then the “as np” says call it np (our alias)
this just simplifies our life without having to always type numpy, we just
type np. IF you’re lost on this, go back to our chapter on importing packages.
Let’s first get to know the most important data structure in numpy.
import numpy as np
Array#
The ndarray is the primary building block of numpy. It enables us to perform mathematical computations efficiently using similar syntax to the equivalent operations for scalar elements as we learned in python fundamental notebook 1. So let’s creat an array object via array methods in numpy.
#create an array
# Let's create an another array
Now we can do some simple computations like we’ve done for scalars in python fundamental notebook 1.
#add the arrays
#multiply the arrays
#look at shape
It seems that there is something missing after the comma. Why? Is it wrong or undefined.
No, it is not wrong but will sometimes lead to unexpected results in computations, especially for operations among matrices and this type of arrays. So we recommend using the reshape methods in numpy to specify the second dimension as 1.
#reshape arr1
# reshape b
Three more ways to initialize 1-d or 2-d arrays:
# Initiallize an array with zeros
# Initiallize an array with ones
# Initiallize an array with ones only in the diagonal
In fundamental notebook 2, we have learned the range object when using it with for loops. Here we present the numpy array version of it.
#arange
#linspace
Transpose an array#
In numpy, transpose an 1-d or 2-d array is super easy and fast via .T.
#transpose
arr3 = np.array([[1, 2, 3], [4, 5, 6]])
#shorthand method
A Gentle Touch on Broadcasting#
Arrays with different sizes cannot be added, subtracted, or generally be used in arithmetic.
A way to overcome this is to duplicate the smaller array so that it is the dimensionality and size as the larger array. This is called array broadcasting and is available in numpy when performing array arithmetic, which can greatly reduce and simplify your code.
For example, what will be the results?
# add 2
It broadcasts the scalar value 2 five times and add it to the each value in the arr1.
Time to practice#
Exercises. Initialize a 4 by 1 array with 2 and named it as arrE1.
arrE1 = ''
Exercises. Initialize a 1 by 4 array with number 3 and named it as arrE2.
arrE2 = ''
Exercises. Can you perform an element wise add operation of the arrE1 and arrE2?
Exercises (challenging). How to create a 3 by 3 array with only zeros in diagonal while the rest is 2?
arrE3 = ''
Slicing#
Slicing in numpy array is like we have done for lists. Let’s first define a two-dimensional array and then review what we have learned.
arr4=np.array([[2,3,4],[8,5,7]])
arr4
array([[2, 3, 4],
[8, 5, 7]])
How to get number 3 from the above 2-dimensional arrays?
#slice row, col
#slice row, then col
#slice using :
Can you figure out why this line of code only return one number instead of 3 and 4? In particular, this is different for the methods in pandas iloc dataframe methods. Be careful with the indexing hassals for different data structure, it may result potential errors and hard to identify.
Let’s see the example first and we will cover more details in next “intro to pandas” notebook.
import pandas as pd
arr4_datafram=pd.DataFrame(arr4)
arr4_datafram.iloc[0,1:2]
1 3
Name: 0, dtype: int64
In addition, we can continue using forward counter, a backward counter, and : operator like we did with list or string data structures when selecting data.
Useful Math Methods in Numpy#
Elementwise Methods#
Remeber in python fundamental notebook 1, when we want to compute the log of a scalar, it returns an error, saying not defined. Yes, it is. Since in python, the majority of math operations like log, exp and so on are defined in numpy package.
Let’s see the following examples…
#log
#e^x
#sqrt
Array-wise Operation#
arr3
array([[1, 2, 3],
[4, 5, 6]])
What will we get in the following?
#sum the array
Interesting, it only returns one number which is the sum of all the elements of the array.
But can we perform row or column sum?
Yes, we can…
arr3
array([[1, 2, 3],
[4, 5, 6]])
# Column Sum -- axis 0
# Row Sum -- axis 1
I know what does axis mean in the above function call may seem confusing right now. Let’s remember one principle: when setting axis, always think about the operation first, whether it will be done across column or across row. If the former, setting axis = 1, otherwise, sett
And we’ll see more examples about this in next “intro to pandas” notebook.
Time to practice#
Exercises. How to compute the column mean?
Exercises. How to compute the column mean in second and third column?
Exercises. How to compute the row mean?
Random number generator#
We can use randn random number generator to generate an numpy with samples from a “standard normal” distribution in specified shape.
For example, we generate a 2 by 4 random number array…
#random 2 x 4 array
Saving array objects#
#create a large array X
#save the array
#load back in
Summary#
Congratulations! First, it’s amazing that you have made it this far. Reflect on what you knew before working through this notebook, namely what we did in python fundamental notebooks. Now reflect on what you can do…AMAZING!!! Let us summarize some key things that we covered.
Numpy Core Objects: An
arraywith one demension is essentially just a vector of data while aarraywith two dimension can be thought a table of data with rows and columns. We will not cover dimension more than 2 in this course.Understanding the 2-d
Array:Learn how to initialize an array with desired values and dimensions.
Become familiar with python built-in computations, e.g.,
+,-, among 1-d or 2-d arrays and the implicit usage of broadcasting in them.Know how to grab elements from an arraym, the elements could be a number or part of the original arrays.
Two types of useful mathmatic methods in array.
Operations perform on each individual elements of the array, e.g.,
np.log.Operations across columns or rows, e.g.,
np.sum. This one require the correctly setting theaxisparameters in thenumpymethods.
Axis Understanding: when setting axis, always think about the operation first, whether it will be done across column or across row. If the former, setting axis = 1. For this course, the axis will always be 0 or 1. We will cover more examples in “intro to pandas notebook”.