Homework 2: Introduction to numpy and pandas#
Problem 1: Import the libraries#
Below, import and alias both numpy and pandas using standard alias’s as we used in class.
Problem 2: Create an array#
Below, create a 1-Dimensional array of integer values from 0 through 100.
Problem 3: Reshape an array#
Reshape the array from problem 2 so that it has 100 rows and 1 column.
Problem 4: Create an array of 4#
Below, create a 10 x 10 array filled with the value 4.
Problem 5: Create a random array#
Below, create a 4 x 8 array of values drawn from a standard normal distribution (mean = 0 and standard deviation = 1).
Problem 6: Slicing an array#
Below, create a 5 x 7 array of random values from a standard normal distribution and assign to a variable named ans6a. Then, select the last two columns and last two rows of the array and assign to ans6b.
Problem 7: Mathematical Operations on an array#
Below, create a 10 x 5 array of random values from a binomial distribution where n = 1000 and p = .3. Then, use numpy functions to find the following summary statistics for each column:
mean
variance
standard deviation
Problem 8: Using arrays in functions#
Below, define a function f that takes a numeric input and outputs the square of that input.
Then, create a list named x_list and an array named x_array containing integers 1 - 20. Finally, compare and discuss the difference between evaluating these functions at all integers – f(x_list) and f(x_array).
Problem 9: Filtering an array#
Below, use the array ex9 and describe what happens with the comparison:
ex9 > 10
What type of objects are returned?
ex9 = np.random.randint(low = 8, high = 13, size = 100)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 1
----> 1 ex9 = np.random.randint(low = 8, high = 13, size = 100)
NameError: name 'np' is not defined
ex9 > 10
array([False, False, True, False, False, False, False, False, True,
False, True, False, False, False, True, False, False, True,
False, True, True, False, False, False, False, True, False,
False, False, True, False, False, False, True, False, True,
False, False, True, False, True, True, False, False, False,
True, False, False, False, True, True, True, False, True,
False, True, False, False, False, True, False, False, True,
False, True, False, False, False, False, False, True, False,
False, True, False, False, True, True, False, False, False,
False, False, True, True, True, False, False, False, False,
False, False, False, True, False, False, True, True, False,
False])
Problem 10: Filtering an array II#
Now, pass the results of the comparison to the array with
ex9[ex9 > 10]
Explain what happened.
Problem 11: Using np.linalg#
Below, using the function np.linalg.norm to compute the distance between two vectors – v1 and v2.
Do this by looking for the norm of the vectors difference.
v1 = np.random.normal(size = 100)
v2 = np.random.normal(loc = 5, size = 100)
Problem 12: Mutliplying matrices#
Below, given matrices m1 and m2, perform matrix muliplication using both the np.matmul function and the @ shorthand.
m1 = np.random.normal(size = (4, 3))
m2 = np.random.normal(size = (3, 4))
Problem 13: Mean Squared Error#
In predictive algorithms where the target is a continuous number we may use the mean squared error to examine the difference between true and predicted values in the model.
Below, you are given two arrays to represent a models predictions y_pred and true values from data y_true. Use these to write a function called mean_squared_error that takes in two arrays and returns the mean squared error or:
where \(\hat{y_i}\) is the prediction of the \(i^{th}\) value and \(y_i\) is the true value.
x = np.linspace(0, 10, 100)
y_true = 3*x + 4 + np.random.normal(size = 100)
y_pred = 3.2*x + 4.1
def mean_squared_error(y_true, y_pred):
pass
Pandas#
The problems below involve using the pandas library.
Problem 14: Make a DataFrame from dict#
Use the dictionary ex_14_data to build a pandas DataFrame named dog_df below.
ex_14_data = {'name': ['Lenny', 'Hardy', 'Oden', 'Hennessy', 'Cliff'],
'breed': ['wildman', 'lab', 'lab', 'pitbull', 'boxer'],
'weight': [40, 70, 65, 73, 46],
'age': [3, 5, 13, 15, 4]}
Problem 15: Columns of a DataFrame#
Below, extract the columns from the DataFrame dog_df.
Problem 16: Basic Information#
Use the .info() method to display basic information about dog_df.
A Second Dataset#
Below, we load a built in dataset from the seaborn library. We will use this library later in the course for data visualization, but for now just note that the object titanic_df is a DataFrame and we can just use the .load_dataset function to access it.
import seaborn as sns
titanic_df = sns.load_dataset('titanic')
Problem 17: Numerical Summaries#
Use the .describe method to display numerical summaries for the columns in titanic_df.
Problem 17: Categorical Summaries#
Use the .describe() method to display summaries of only the categorical columns. (Hint: what does the include argument do?)
Problem 18: Shape of the data#
How many rows and columns are in the DataFrame titanic_df?
Problem 19: Summaries#
What is the mean and standard deviation of the age column?
Problem 20: Summaries by sex#
Now, determine the mean and standard deviation for the male passengers age.
Problem 21: Young people in first class#
Subset the data to passengers under the age of 18. How many of these passengers traveled first class?
Problem 22: Older people embarked#
Subset the data titanic_df to people over the age of 70. What was the most common point of departure for them (embark_town)?
Problem 23: Survival Rate#
Use the survived column to compute the survival rate for all passengers.
Problem 24: Renaming Columns#
Rename the column sibsp to num_siblings.
Problem 25: Dropping Columns#
Drop the column adult_male from titanic_df.
Code Wars#
It is important to continue practicing the fundamentals as we move into using more and more libraries. You will always need to fall back on the basics, so we want you to spend time each week working on some coding challenges. Again, head over to code wars and find two practice problems that you are able to solve. Clearly state the problem, your solution, and an alternative solution that you didn’t think of but was displayed upon cracking your challenge below.