Python Fundamentals 1#

Outline#

Time to start programming! We work our way through some of the essentials of Python’s core language. We will do this within a Jupyter Notebook and along the way, become familiar Markdown as well as other properties of the notebook environment.

OBJECTIVES

  • Use variables to represent different Python data types

  • Identify and use int, float, bool, and str data types in Python

  • Identify, differentiate between, and use list, dict, set, and tuple collections in Python

  • Iterate over collections using for loops

  • Use conditional statements to control the flow of programs

Markdown Essentials#

Markdown is a simplified version of html (“hypertext markup language”), the language used to construct basic websites. It has a zen-like simplicity and beauty.

  • Headings. Large bold headings are marked by hashes (#). One hash for first level (very large), two for second level (a little smaller), three for third level (smaller still), four for fourth (the smallest). Try these in a Markdown cell to see how they look:

    # Data Bootcamp sandbox
    ## Data Bootcamp sandbox
    ### Data Bootcamp sandbox
    

    Be sure to run the cell when you’re done (shift enter).

  • Bold and italics. If we put a word or phrase between double asterisks, it’s displayed in bold. Thus **bold** displays as bold. If we use single asterisks, we get italics: *italics* displays as italics.

  • Bullet lists. If we want a list of items marked by bullets, we start with a blank line and mark each item with an asterisk on a new line:

    * something
    * something else
    
  • Links. We construct a link with the text in square brackets and the url in parentheses immediately afterwards. Try this one:

    [Data Bootcamp course](http://nyu.data-bootcamp.com/)
    

We can find more information about Markdown under Help. Or use your Google fu.

Exercise. Ask questions if you find any of these steps mysterious:

  • Close Jupyter.

  • Start Jupyter.

  • In Jupyter, open an new Ipython notebook within your Data_Bootcamp directory/folder, point to the code cell, the name of the notebook, and the help button.

  • Save the file bootcamp_class_pyfun1 in your This file will serve as your notes for this class.

  • Create a description cell in Markdown at the top of your notebook. It should include your name and a description of what you’re doing in the notebook. For example: “Mike Waugh’s first notebook for Python fundamentals 1” and a date.


Simple Calculations and Assignment#

Literally the “bread and butter” of scientific computation…let’s get started:

test = 2*3 # simple multiplication
print(test)
# What about division...
test = 2/3 
print(test)
# What about modulus...
test = 2%3
print(test)

Side note Note how in the cell I have comments that are not interperted by python. To create comments simply type # and then what ever comment you want to make. Comments are important because they help make your code readable.

test = 2^3 # This is what you would do in excel (matlab too)
print(test)
print("is this a 8???")
test = 2**3 # Now what happens...
print(test)
print("is this a 8???")
test = log(3) # what do you think will happen here...log is not defined
              # log is not a built-in function who will have a different color in notebook
print(test)

Note how the compute just stopped. It did not compute. Remember, python and the computer are DUMB! You gave it an instruction that it did not know what to do, so it stopped, and did not proceed. A couple of points about this:

  • The top to bottom (within a code cell), simply following instructions/commands nature of a program.

  • When you run this, note how (after some stuff) tells you where the problem is: Line 1, then this name log is not defined.

test = 4**2 # Now what happens...
print(test)
print("is this a 8???")

Assignment Above I’ve been assigning variables… but let’s look at this more closely…

x = 2 

Nice so the thing on the left is the “variable” named “x”, then the thing on the right is the value that this variable is assigned… then the = sign is the operator that assighns that value.

print(x)

y = 3 # lets do it again...
print(y)

Now we are getting somewhere, we take these variables and perform an operation. Notice that (like excel) the value assigned to z will change as we change the values assigned to x or y. But there is a difference with excel…what is it?

z = x/y 

print(z)

Here is a place where you might want to figure out what variables there are within the enviornment. For example, what is x (the type and size ignore for now) and then the value…how do you do this, with the whos command, which will provide this information.

whos

This is a nice feature in that it is a way for you to always understand what variables are in your environment at any point in time, their type, etc.


Time to practice#

We will do this alot. Here is the deal: Below are a set of excercises, take a couple of minutes and (i) create a code cell below each one and (ii) try and answer them as best as possible. If we don’t cover them all inclass, try and attempt them later as you review.

Exercise. Type w = 7 in a cell. In the same cell, next line below, type w = w + 2. In the next line below type w (so we can see the output). What does this code do? Why is this not a violation of basic mathematics?

Exercise. In another code cell type w = w + 2 and then w below it (again so we can see the output). Evaluate this cell once. Do it again. Do it again. What is going on here?

Exercise. Suppose we borrow 200 for one year at an interest rate of 5 percent. If we pay interest plus principal at the end of the year, what is our total payment? Compute this using the variables principal = 200 and i = 0.05.

Exercise. Real GDP in the US (the total value of things produced) was 15.58 trillion in 2013 and 15.96 trillion in 2014. What was the growth rate? Express it as an annual percentage.

Exercise (challenging). Suppose we have two variables, x and y. How would you switch their values, so that x takes on y’s value and y takes on x’s?

Exercise (challenging). Type x = 6 in a cell. We’ve reassigned x so that its value is now 6, not 2. If we type and submit z, we see

In [10]: z
Out[10]: .6666666666

But wait, if z is supposed to be x/y, and x now equals 6, then shouldn’t z be 2? What do you think is going on?


Printing (and help)#

It’s important in the sense that if we don’t tell the computer to report or “print” the results, then generally we will not see it.

First, let’s practice using the help command by print?

print?

So a window should pop up showing things that (i) values must be seperated by commas, when it is printed how to seperate them, what to do at the end, etc.

print(x, y, sep='---')
print(x,y, end='\n \n \n \n \n')

Notice all the white space, this is what the character \n does, it stands for a return or jump to the next line.


Strings#

This is where I think python is VERY POWERFULL…lots of enviorenments can do numerical calculations, plotting well, but handling and manipulating strings is less common…

  • Lesson 1: A string is a collection of characters between quotation marks

  • Lesson 2: A string may look like a number, but it is not. ‘12’/3 this is not going to work as “12” is a string, python does not see it as a number, and then it is being asked to perform a numerical computation on something that is not a number, thus an error message.

a = "some"
b = "thing"
c = a + b # this is awesome....so natural and intuitive... suppose you tried
            # this in excel?? what would happen.
print(c)
# Back to print, we can do some cool things with this...
print("the value of z is", z)

# or even do something like this
message = "the value of z is"
print(message, z)

Time to practice#

Below are a set of excercises, take a couple of minutes and (i) create a code cell below each one and (ii) try and answer them as best as possible. If we don’t cover them all inclass, try and attempt them later as you review.

Exercise. What happens if we run the statement: ‘Chase’/2? Why?

Exercise. This one’s a little harder. Assign your first name as a string to the variable firstname and your last name to the variable lastname. Use them to construct a new variable equal to your first name, a space, then your last name. Hint: Think about how you would express a space as a string.

Exercise. Set s = ‘string’. What is s + s? 2s? s2? What is the logic here?


Quotation Marks#

Here is the thing, you’ll notice that sometimes I use single quotation, double quotation marks…

  • First, both are valid ways to define a string. The real issue is my inconsistent use partly this is a problem within the NYU databoot camp team…I actually prefer double.

  • Second, the fact that both are valid is not an accident, in fact, double quotation marks and even triple quotation marks play an important roles.

a = 'string'
b = "string"
print(a,b) # We should see the same thing....

# This is one instance where double helps...
message = "I don't know what I'm doing"
print(message)

Note how in the last line of code I can use the apostrophe. This is the value added of double quotation marks in that it can handle more complicated punctuation within the quoation marks. Now what about this…

longstring = """
Four score and seven years ago
Our fathers brought forth. """

print(longstring) 

Here triple quotation marks allow us to have multiple lines.


Time to practice#

Exercise. In the Four score etc code, replace the triple double quotes with triple single quotes. What happens?

Exercise. Fix this code:

bad_string = 'Sarah's code'

print(bad_string)

Exercise. Which of these are strings? Which are not? Edit the markdown cell and type next to each one string, not string.

  • apple

  • “orange”

  • ‘lemon84’

  • “1”

  • string

  • 4

  • 15.6

  • ‘32.5’


Lists#

Key concept: A list is an ordered collection of items. This will obviously be important as data will naturally come in a list or list like form. Moreover, this will also give us our first taste of “slicing” or grabbing specific elements of a list.

# Some examples of listis...
numberlist = [1, 5, -3] # Note also the use of square brackes...this is what
                        # defines a list, () are tuples, {} are sets...

print(numberlist)

stringlist = ['hi', 'hello', 'hey']

a = 'some'

b = 'thing'

c = a + b

variablelist = [a, b, c]
print("\n")
print(variablelist, end = "\n \n")

Now what is really cool is that you can have a list with different types…

randomlist = [1, "hello", a]
print(randomlist)

So notice that the first part of the list is an integer, then a string, then the variable a (which currently is a string as well) Then there is the combining of list… so here is this awesome example

big_list_one = randomlist + stringlist
print(big_list_one, end = "\n \n")

So notice what this did here, it litterally took randomlist and then added it to the stringlist, so we have a new list that combines all of this

What do you think this does…

big_list_two = [randomlist, stringlist]

print(big_list_two, end = "\n \n")

VERY INTERESTING….This took the two lists and the created another list which is composed of two lists!!! A “List of Lists”

Final point, its worth understanding the “ordered” part…this means for each item in the list we can call that item with its order in the list or number. Key: Python starts from the number 0, so the first item in a list is item number zero…Lets try some stuff

print(randomlist[0], "Should print the first value, a one", end = "\n \n")
print(randomlist[2], " Should print the last value, 'some'", end = "\n \n")

# Now lets do this with big_list_two, the "list of list"
print(big_list_two[1], " Should be the list 'hi', 'hello', 'hey'")

Time to practice#

Exercise. How would you explain a list to a classmate?

Exercise. Add print(numberlist) and print(variablelist) to your code and note the format of the output. What do the square brackets tell us? The single quotes around some entries?

Exercise. What is the output? How would you explain it to a classmates?

mixedlist = [a, b, c, numberlist]
print(mixedlist)

Exercise. Suppose x = [1, 2, 3] is a list. What is x + x? 2*x? Try them and see.

x = [1,2,3]
print(x+x) # This is going to make another list that is just 1,2,3,1,2,3
print(2*x) # Same thing...amazing!!!

mtwn One thing that is interesting here is that (for those of you with prior experince in math or computing), you may be thinking that a numerical operation on a list will be like a vector operiation (a list looks similar to a vector). But it does not. In this sense the list operations are more general, usable for doing things like expanding the list, combining lists, etc. If we want to perform vector-like operations, then we need to change the type and import the numpy package. We will see this later.


Tuples#

Tuples are very similar to lists, but the key issue is once a tuple is set, then the entries in it cannot be changed. This is what they call “immutability” of a tuple. In contrast, in a list you can change individual elements. Let’s see this.

test_tuple = (1,2,3) # Similar to a list, but round brackets...
print(numberlist)
numberlist[0] = 328
print(numberlist) # Note how I changed the fist entry int he list!!!

Now here is a tuple…like with a list, individual elements are seperated by a comma. Unlike a list, we see round brackets, not square.

test_tuple = (1,2,3) # Similar to a list, but round brackets...
print(test_tuple)

And again, notice that when it is printed out we see the round brakcets indicating that it is a tuple.

test_tuple[0] = 328 # UNCOMMENT THIS PART OF THE CODE TO UNDERSTAND 

This won’t run “TypeError: ‘tuple’ object does not support item assignment” meaning, that once set you can not assignment new values… This is the immutability property of a tuple.


Dictionaries#

Dictionaries are (unordered) pairs of things defined by curly brackets {}, separated by commas, with the items in each pair separated by colon. We use them a fair amount of time for several reasons: One they are great for creating (small) table like datasets that can be converted to a DataFrame. Second, they are great for converting some value in your data. In one example, I had a dataset that had education catagorized by a string like [“primary”, “secondary”] and I wanted to covert it into years of schooling. So I used a dictionary that mapped the words into years. Third, we often come accross them as a data structure generated by APIs.

For example, here is a list of first and last names:

names = {'Jacob': 'Koehler', 'Lenny': 'Koehler', 'Maayan': 'Munchkin'}
print(names)
names["Jacob"]

What about this…

names["Koehler"]

What happend here… we can’t go from values to keys with a dictionary. The main idea is a dictionary is a one-to-many data structure. That means we must have unique keys (this is the one part)…but different keys can have the same value (this is the many part). This implies that going backwards from values to keys is not possible.

For example, we can…

names["Backus"] = 'Backus'
print(names)

Time to practice#

Exercise. Print the names. Does it come out in the same order we typed it?

Exercise. Construct a dictionary whose keys are the integers 1, 2, and 3 and whose values are the same numbers as words: one, two, three. How would you get the word associated with the key 2?

Exercise. Enter the code

d = {'Donald': 'Duck', 'Mickey': 'Mouse', 'Donald': 'Trump'}

print(d)

What happened?

Exercise. Consider the dictionary

data = {'Year': [1990, 2000, 2010], 'GDP':  [8.95, 12.56, 14.78]}

What are the keys here? The values? What do you think this dictionary represents?


Built in Functions#

We have already seen several ones, e.g., whos which tells what variables are defined and some properties (type, value) as well. Then we have seen the print function which will print the results on the screen. Here are some other ones we will use all the time

  • len tells us the lenght of the list, string, dictionary etc.

  • type tells us the type of the variable, e.g. a list, sting, float, integer, dictionary… Here are some examples

print(len('hello world')) # note how len is "counting how many characters" 
                          # And it is including the white space...
    
print(len([1, 5, -3])) # How many times in the list

print(len((1, 5, -3))) # how many items in the tuple

print(len('1234')) # String, so how many characters

print(len('12.34')) # Again, a string....
len(12.34) # same issue with a floating point number...

Then this is the type command…

print(type(2)) # Its saying an integer

print(type(2.5)) # A floating point number, or float

print(type('2.5')) # Looks like a number, but a string

print(type('something')) # String

print(type([1, 5, -3])) # Here it sees that this is a list...

print(type((1, 5, -3))) # A tuple

What is a floating point number…this is mtwn but the basic idea is that numbers can not be stored in the computers memory with infinite precision (why? a computers memory is not infinite), so they are approximated. With a 64-bit computer and double precision arithmatic this is about to 16 digits.


Changing types#

Types matter a lot in python, but sometimes we will want to change the type of a varible. This is something that will come up often in our data work….

s = '12.34' # This is a string (check it to veryify...)

f = float(s) # This builtin function will convert the string to a float

print(type(f)) # It should now tells us that f is a float...

s = "12"

i = int(s) # This should convert the string to an integer...what if we did the 
            # string "12.34"??? 

print(type(i)) # This should be a type integer...

Then we can always convert it back….

s = str(12) # So start with an integer and go to a string...
print('s has type', type(s))

t = str(12.34) # Or start with a float and go to a string as well
print('t has type', type(f))

Big picture This is again a super powerful aspect of python that makes it very applicable for working with data…the ability to go from numbers to strings and back.

# This is cool...start with a string and make it a list by the command list
x = 'abc'

y = list(x)

print(y) # So now y should be a list of a, b, c

Time to practice#

Exercise. What happens if we apply the function float to the string ‘some’?

Exercise. What is the result of list(str(int(float('12.34'))))? Why? Hint: Start in the middle (the string ‘12.34’) and work your way out, one step at a time. This is similar to question 13 and 14 on Code Practice #1.

Exercise. How would you convert the integer i = 1234 to the list l = [‘1’, ‘2’, ‘3’, ‘4’]? This is similar to question 18 on Code Practice #1. Lets do that one instead.

Exercise (challenging). This one is tricky, but it came up in some work we were doing. Suppose year is a string containing the year of a particular piece of data; for example, year = ‘2015’. How would we construct a string for the following year?


Programming Errors#

Fact of life: you will make errors. Many errors. The key to programming is (i) not getting discouraged and living with that fact and (ii) learning how to make sense of error messages and self-correct those errors.

Point (i) is a life skill that takes years to learn. However, we can help you with (ii), below we talk through some very common error messages and how to identify them.

Name Error#

It happens when we use something not defined, it could be a variable or a function. The associated output is an error message that includes (i) what line the issue is occuring in and (ii) the name that could not be found. Here are two examples:

# Using not defined variable
print(NotDefined)
# Another situation, here we are 
# using function that is not defined.
log(3) 

So you see in both of these instances that there is an arrow pointing to the line within each code cell. In the first instance it is pointing to line 2. This is where the issue is. In the second instance, it is pointing to line number 3.

And after pointing to the issue, below that is says NameError: and stuff. In the first instance, it tells us NotDefined is well…not defined. In the second instance, its saying the same thing. It just does not know what log is.

Type Error#

This one happens all the time too. It happens when an inappropriate operation or function is applied to that specific data type. Here are some examples:

x = "2"
y = 2

z = x + y

Like above, it tells us that line number 4 is the issue, i.e. where we are tying to add “2” and 2. And the issue is a type issue, we can’t add two different types (in this case a string and an integer).

Here is another example relating to tuples. Recall that with a tuple, the data type is immutable. That is it can’t be changed. But lets try and change it…

tuple_error=(2,3)
tuple_error[0]=5

Here it says, line 2…there is a problem. A TypeError problem. And what is the specific issue, well the tuple object does not support this kind of operation.

Important A lot of the problams in interperting the error message lies in deciphering the cryptic messages like “‘tuple’ object does not support item assignment” So how do we do this…use google Often the first result will be a question posted to www.stackoverflow.com which is a place for programers to ask and answer questions. This is a good place to be comfortable with and seek help from.

Excercise: In the google search area type “tuple’ object does not support item assignment” What did you find?

Invalid Syntax#

Syntax errors can be detected before your program begins to run. These types of errors are usually typing mistakes (fat fingers), but migth be hard to find out at first. Here we give two examples:

# Define a simple list and let's call the first one in the list
randomlist =[1,8,3,7]
randomlist[0]]

I know this example may seem easy to identify, but imagine when you write a long code like below, it could be hard. Can you find where is missing?

goal_model_data = pandas.concat([train[['HomeTeam','AwayTeam','HomeGoals']].assign(home=1).rename(
                columns={'HomeTeam':'team', 'AwayTeam':'opponent','HomeGoals':'goals'}),
               train[['AwayTeam','HomeTeam','AwayGoals']].assign(home=0).rename(
                columns={'AwayTeam':'team', 'HomeTeam':'opponent','AwayGoals':'goals'}])
# Or when we define a string
bad_string = 'code"

Key Error#

Python raises a KeyError whenever a dict() object is requested (using the format a = adict[key]) and the key is not in the dictionary.

names = {'Dave': 'Backus', 'Chase': 'Coleman', 'Spencer': 'Lyon', 'Glenn': 'Okun'}
names['David']

“No Idea” Errors#

These are errors that you have no idea what is going on. A couple of tips:

  • Ask a friend. While the movie vision of a coder is some guy in a hoody in a dark room by himself, this is not how we work. Working together, as a team, is an important aspect of data analysis and coding in general. So if you have a problem, ask for help. Explain to him/her what you were trying to do (often just this process helps solve the issue) and then show them the output.

  • Google fu Use google. Chances are you are not the first one to have this problem. Just cut and past the error message into google and track down what other people have to say about it.


Summary#

Congratulations! First, it’s amazing that you have made it this far. Reflect on what you knew before working through this notebook. Now reflect on what you can do…AMAZING!!! Let us summarize some key things that we covered.

  • Assignments and variables: We say we assign what’s on the right to the thing on the left: x = 17.4 assigns the number 17.4 to the variable x.

  • Data types and structures:

    • Strings. Strings are collections of characters in quotes: ‘this is a string’.

    • Lists. Lists are collections of things in square brackets: [1, ‘help’, 3.14159].

    • Number types: integers vs. floats. Examples of integers include -1, 2, 5, 42. They cannot involve fractions. Floats use decimal points: 12.34. Thus 2 is an integer and 2.0 is a float.

    • Dictionary. Dictionaries are collections of unordered things in {} with key-value pairs: names = {‘Dave’: ‘Backus’, ‘Chase’: ‘Coleman’}.

  • Built-in functions:

    • The print() function. Use print('something', x) to display the value(s) of the object(s) in parentheses.

    • The type() function. The command type(x) tells us what kind of object x is. Past examples include integers, floating point numbers, strings, and lists.

  • Type conversions:

    • Use str() to convert a float or integer to a string.

    • Use float() or int() to convert a string into a float or integer.

    • Use list() to convert a string to a list of its characters.

  • Error message types:

    • NameError. Usually happens when using something not defined which could be variable or methods.

    • TypeError. Raise when an operation or function is applied to an object of inappropriate type. For example, tuples have no "=" methods while number no len.

    • Invalid syntax. Syntax errors can be detected before your program begins to run. So first thing to do is to check typos, parentheses, etc.

    • KeyError. It happens when you refer a key not previously defined in the dictionary.

Useful Tricks and Programming Tools#

  • Comments. Use the hash symbol # to add comments to your code and explain what you’re doing. Don’t underestimate the importance of creating well commented code. Here are some thoughts on this…

  • Help. We can get help for a function or method foo by typing foo? in the IPython console or foo in the Object explorer. Try each of them with the type() function to remind yourself how this works.

  • Error Messages Look at the message, (i) read where the issue is and (ii) track down what the message is associated with that line. Or ask a friend!