Climb the Ladder!#

Our class moves quickly! Sometimes, it feels like we make leaps in logic that are a bit too big. In this ladder challenge, we’ll learn some core math concepts, some linear algebra, and the numpy library. Problems in this notebook start out easy and progressively get harder, so that the next rung of the Python ladder is always within reach.

Additionally, since not all of the topics discussed in this ladder challenge are explicitly taught in our course, these problems come with many more hints, tips, suggestions, and even sometimes a mini-lesson. You are encouraged to Google frequently throughout the lesson. In many ways, this ladder is meant to be a challenge as well as educational in its own right.

Remember our one rule: NO LOOPS! None of the exercises in this notebook require a loop. If you use a loop to solve any of these problems, you are solving the problem incorrectly.

  1. Import numpy in the usual way

import numpy as np

Section III - Simulation#

In the next section, we’ll use functions within the np.random submodule. You can find documentation here.

  1. Generate 10,000 random numbers between 0 and 1 and assign them to a variable. To verify you’ve simulated the data properly, make sure the mean is approximately 0.5.

  1. What proportion of these numbers is below 0.1?

  1. What proportion of these numbers is above 0.8?

  1. What proportion of these numbers is above 0.2 but below 0.3?

  1. Generate 10,000 random numbers between 2 and 4. To verify you’ve simulated the data properly, find the mean and make sure it is approximately what you expect.

  • Hint: There is no numpy function for this specifically. How can you use the function you just used to generate this?

  1. What proportion of these numbers is between 2.4 and 2.6?

  1. Generate 100,000 random standard normal (i.e., mean 0 standard deviation 1) numbers. Again, find the mean to verify you’ve done this properly.

  1. What proportion of these numbers is negative?

35a) What proportion of these numbers is between -1 and 1?

35b) What proportion of these numbers is between -2 and 2?

35c) What proportion of these numbers is between -3 and 3? Have you seen your last 3 solutions before? (If you’ve taken an intro stats course in college before, you will have.)

For the next few problems, we will be playing Rock-Paper-Scissors#

If you are unfamiliar with the game Rock-Paper-Scissors, it features two combatants choosing one of three hand motions: rock, paper, or scissors. Rock beats scissors, scissors beats paper, and paper beats rock. Two friends are playing: Karen and Tom. Unbeknownst to them, you’ve been studying and recording both of their play patterns. Karen chooses rock 40% of the time, paper 10% of the time, and scissors 50% of the time. For Tom, it’s rock 10%, paper 60%, scissors 30%. Who wins the most often?

This is an extremely difficult question. We will get to the answer in a few guided steps. You’ll want to use np.random.choice() to help you through this.

  1. Create vectors p_karen and p_tom that represent their respective probabilities for rock, paper, and scissors.

pkaren = np.array(['r', 'r', 'r', 'r', 'p', 's', 's', 's', 's', 's'])
ptom = np.array(['r', 'p', 'p', 'p', 'p', 'p', 'p', 's', 's', 's'])
# Tom, it's rock 10%, paper 60%, scissors 30%. Who wins the most often?
#Karen chooses rock 40% of the time, paper 10% of the time, and scissors 50% of the time.
  1. Simulate 5 games. Who wins the majority of them? Just eyeball this one. (No one wins a draw.)

  1. Let’s write a function to handle one game at a time. Write a function called rps that takes two arguments: karen and tom that will be either "R", "P", or "S". The function will return "K", "T", or "D", representing Karen, Tom, or a draw. That is, the function should give the following results:

  • rps("R", "P") ==> "T"

  • rps("R", "S") ==> "K"

  • rps("R", "R") ==> "D"

  • Hint: Your answer will be a mess of if/elif statements.

  1. As it stands now, the function you have written cannot handle vector data. Luckily, numpy gives us a function that allows us to vectorize any function we want. Use np.vectorize to create rps_vectorized, the vectorized version of rps. Skim the docs here.

  1. Simulate 1,000,000 (yes, one million) games. How often does Karen win? You can find the results by:

  1. Replicating your solution to problems 37 and 38, except for one million instead of 5.

  2. Using the function you made in problem 39.

  • Note 1: These probabilities are relatively difficult to figure out by hand. Some probabilities are best discovered by simulation. Another way of asking the above question is “What is the probability of Karen winning?”)

  • Note 2: Look what we did! We used vectorization to completely eliminate the need for a loop! You could have solved this problem with a loop, but it would have taken significantly more computer time.

karen = np.random.choice(pkaren, replace=True, size=1000000)
tom = np.random.choice(ptom, replace=True, size=1000000)

Regression Simulation#

Next, suppose we’re trying to simulate some fake data for a regression problem we wish to give to our students. We wish to simulate data for the equation:

\[y = 1000 + 200x + \epsilon\]

Where \( \epsilon ~ N(0, 20) \) (that is \(\epsilon\) is normally distributed with mean 0 and standard deviation 20).

  1. Generate 10 \(x\)s from the \(N(50, 10)\) distribution. (That is, the normal distribution with mean 50 and standard deviation 10).

  1. Generate 10 \(\varepsilon\)s from the appropriate distribution described above.

  1. Simulate the \(y\)s as described above using the two vectors you made in the previous two problems.

Section IV: Matrices#

Tiny note: In general, we have told you it’s against Python’s style guide to use capital letters when defining variables. The one exception we make is when our variables represent mathematical objects. So feel free to name the following variables things like A, B, etc.!

  1. Create the following matrix:

\[\begin{split} A = \begin{bmatrix} 3 & -1 & 5 \\ -2 & 0 & 8 \\ 4 & 5 & -7 \end{bmatrix} \end{split}\]
  1. Create the following matrix:

\[\begin{split} B = \begin{bmatrix} -3 & 8 & 2 \\ 2 & -3 & 5 \\ 0 & 6 & -2 \end{bmatrix} \end{split}\]
  1. Use np.eye() to create the following matrix:

\[\begin{split} I = I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \end{split}\]
  1. Triple every element of B. (Do not reassign.)

  1. Index A in order to get me -7.

  1. Index A in order to get me 0.

  1. Index A to get me \(\begin{bmatrix} -2 & 0 & 8 \end{bmatrix}\)

  1. Index A to get me $\( \begin{bmatrix} -1 \\ 0 \\ 5 \end{bmatrix} \)$

(You can ignore the fact that it may print out as a row vector)

  1. Index A to get me $\( \begin{bmatrix} 0 & 8 \\ 5 & -7 \end{bmatrix} \)$

  1. Redefine the middle column of A to be $\( \begin{bmatrix} -2 \\ 1 \\ 7 \end{bmatrix} \)$

  1. Index B to define the following matrix: $\( C = \begin{bmatrix} 8 & 2 \\ -3 & 5 \\ 6 & -2 \end{bmatrix} \)$

  1. What is \(A + B\)?

  1. What is \(2A - 3B\)?

  1. What is the elementwise product of \(A\) and \(B\)?

  1. What is \(AB\)?

  1. What is \(BA\)? And why isn’t it the same as \(AB\)?

  1. What is \(AI\), and is it equal to \(IA\)? Does this product look familiar?

  1. What is \(AC\)?

#C is not defined using I instead
  1. Why do we get an error when calculating \(CA\)?

#C is not defined
  1. What is \(A^TA\)? Note that you answer will be a diagonal matrix, that is, a matrix that is equal to its transpose.

  1. What is \(A^TB\)?

  1. What is \(A^{-1}\) (the inverse of \(A\)).

  1. What is \(AA^{-1}\)? Does it look familiar?

  • Hint: Maybe call an np.round() on your result.

  1. What is \((B + A^TA)^{-1}A^TC\)?