Python Fundamentals 1#

Outline#

Time to start programming! We work our way through some of the essentials of Python’s core language. We will do this within a Jupyter Notebook and along the way, become familiar Markdown as well as other properties of the notebook environment.

OBJECTIVES

  • Use variables to represent different Python data types

  • Identify and use int, float, bool, and str data types in Python

  • Identify, differentiate between, and use list, dict, set, and tuple collections in Python

  • Iterate over collections using for loops

  • Use conditional statements to control the flow of programs

\[\sum_{i = 1}^n i\]

Markdown Essentials#

Markdown is a simplified version of html (“hypertext markup language”), the language used to construct basic websites. It has a zen-like simplicity and beauty.

  • Headings. Large bold headings are marked by hashes (#). One hash for first level (very large), two for second level (a little smaller), three for third level (smaller still), four for fourth (the smallest). Try these in a Markdown cell to see how they look:

    # Data Bootcamp sandbox
    ## Data Bootcamp sandbox
    ### Data Bootcamp sandbox
    

    Be sure to run the cell when you’re done (shift enter).

  • Bold and italics. If we put a word or phrase between double asterisks, it’s displayed in bold. Thus **bold** displays as bold. If we use single asterisks, we get italics: *italics* displays as italics.

  • Bullet lists. If we want a list of items marked by bullets, we start with a blank line and mark each item with an asterisk on a new line:

    * something
    * something else
    
  • Links. We construct a link with the text in square brackets and the url in parentheses immediately afterwards. Try this one:

    [Data Bootcamp course](http://nyu.data-bootcamp.com/)
    

We can find more information about Markdown under Help. Or use your Google fu.

Exercise. Ask questions if you find any of these steps mysterious:

  • Close Jupyter.

  • Start Jupyter.

  • In Jupyter, open an new Ipython notebook within your Data_Bootcamp directory/folder, point to the code cell, the name of the notebook, and the help button.

  • Save the file bootcamp_class_pyfun1 in your This file will serve as your notes for this class.

  • Create a description cell in Markdown at the top of your notebook. It should include your name and a description of what you’re doing in the notebook. For example: “Mike Waugh’s first notebook for Python fundamentals 1” and a date.


Simple Calculations and Assignment#

Literally the “bread and butter” of scientific computation…let’s get started:

test = 2*3 # simple multiplication
print(test)
6
# What about division...
test = 2/3 
print(test)
0.6666666666666666
# What about modulus...
test = 2%3
print(test)
2
12//5
2

Side note Note how in the cell I have comments that are not interperted by python. To create comments simply type # and then what ever comment you want to make. Comments are important because they help make your code readable.

test = 2^3 # This is what you would do in excel (matlab too)
print(test)
print("is this a 8???")
1
is this a 8???
test = 2**3 # Now what happens...
print(test)
print("is this a 8???")
8
is this a 8???
test = log(3) # what do you think will happen here...log is not defined
              # log is not a built-in function who will have a different color in notebook
print(test)

Note how the compute just stopped. It did not compute. Remember, python and the computer are DUMB! You gave it an instruction that it did not know what to do, so it stopped, and did not proceed. A couple of points about this:

  • The top to bottom (within a code cell), simply following instructions/commands nature of a program.

  • When you run this, note how (after some stuff) tells you where the problem is: Line 1, then this name log is not defined.

test = 4**2 # Now what happens...
print(test)
print("is this a 8???")

Assignment Above I’ve been assigning variables… but let’s look at this more closely…

x = 2 

Nice so the thing on the left is the “variable” named “x”, then the thing on the right is the value that this variable is assigned… then the = sign is the operator that assighns that value.

print(x)

y = 3 # lets do it again...
print(y)
2
3

Now we are getting somewhere, we take these variables and perform an operation. Notice that (like excel) the value assigned to z will change as we change the values assigned to x or y. But there is a difference with excel…what is it?

z = x/y 

print(z)
0.6666666666666666

Here is a place where you might want to figure out what variables there are within the enviornment. For example, what is x (the type and size ignore for now) and then the value…how do you do this, with the whos command, which will provide this information.

whos
Variable   Type     Data/Info
-----------------------------
test       int      16
x          int      2
x_val      int      2
y          int      3
z          float    0.6666666666666666

This is a nice feature in that it is a way for you to always understand what variables are in your environment at any point in time, their type, etc.


Time to practice#

We will do this alot. Here is the deal: Below are a set of excercises, take a couple of minutes and (i) create a code cell below each one and (ii) try and answer them as best as possible. If we don’t cover them all inclass, try and attempt them later as you review.

Exercise. Type w = 7 in a cell. In the same cell, next line below, type w = w + 2. In the next line below type w (so we can see the output). What does this code do? Why is this not a violation of basic mathematics?

w = 7
w = w+2
w
9

Exercise. In another code cell type w = w + 2 and then w below it (again so we can see the output). Evaluate this cell once. Do it again. Do it again. What is going on here?

w = w + 2
w
17
w += 2
w
19

Exercise. Suppose we borrow 200 for one year at an interest rate of 5 percent. If we pay interest plus principal at the end of the year, what is our total payment? Compute this using the variables principal = 200 and i = 0.05.

principal = 200
i = 0.05
principal = principal * (1 + i)
principal
210.0

Exercise. Real GDP in the US (the total value of things produced) was 15.58 trillion in 2013 and 15.96 trillion in 2014. What was the growth rate? Express it as an annual percentage.

t13 = 15.58
t14 = 15.96
gr = (t14 - t13)/t13*100
print(gr, '%')
2.4390243902439073 %

Exercise (challenging). Suppose we have two variables, x and y. How would you switch their values, so that x takes on y’s value and y takes on x’s?

x = 1
y = 2
a = x
b = y
x = b
y = a
print(x,y)
2 1
a, b, c = 1, 2, 3
x = 1
y = 2
y,x = x,y
print(x, y)
2 1
x = 1
y = 2
x = x + y
y = x - y
x = x - y
print(x, y)
2 1
x = 1
y = 2
temp = x
x = y
y = temp
print(x,y)
2 1

Exercise (challenging). Type x = 6 in a cell. We’ve reassigned x so that its value is now 6, not 2. If we type and submit z, we see

In [10]: z
Out[10]: .6666666666

But wait, if z is supposed to be x/y, and x now equals 6, then shouldn’t z be 2? What do you think is going on?

x = 2
print(x) #explain my code
print(z)
2
0.6666666666666666

Printing (and help)#

It’s important in the sense that if we don’t tell the computer to report or “print” the results, then generally we will not see it.

First, let’s practice using the help command by print?

print?
Signature: print(*args, sep=' ', end='\n', file=None, flush=False)
Docstring:
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
Type:      builtin_function_or_method

So a window should pop up showing things that (i) values must be seperated by commas, when it is printed how to seperate them, what to do at the end, etc.

print(x, y, sep='---')
2---1
print(x,y, end='\n \n \n \n \n')
2 1
 
 
 
 
#f-strings: Used to inject variables into a print statement
print(f"The variable x is {x}")
The variable x is 2

Notice all the white space, this is what the character \n does, it stands for a return or jump to the next line.


Strings#

This is where I think python is VERY POWERFULL…lots of enviorenments can do numerical calculations, plotting well, but handling and manipulating strings is less common…

  • Lesson 1: A string is a collection of characters between quotation marks

  • Lesson 2: A string may look like a number, but it is not. ‘12’/3 this is not going to work as “12” is a string, python does not see it as a number, and then it is being asked to perform a numerical computation on something that is not a number, thus an error message.

a = "some"
b = "thing"
c = a + b # this is awesome....so natural and intuitive... suppose you tried
            # this in excel?? what would happen.
print(c)
something
# Back to print, we can do some cool things with this...
print("the value of z is", z)

# or even do something like this
message = "the value of z is"
print(message, z)
the value of z is 0.6666666666666666
the value of z is 0.6666666666666666
print('Lenny's favorite thing to do is play with his bone.')
  Cell In[49], line 1
    print('Lenny's favorite thing to do is play with his bone.')
                                                              ^
SyntaxError: unterminated string literal (detected at line 1)

Time to practice#

Below are a set of excercises, take a couple of minutes and (i) create a code cell below each one and (ii) try and answer them as best as possible. If we don’t cover them all inclass, try and attempt them later as you review.

Exercise. What happens if we run the statement: ‘Chase’/2? Why?

"chase"/2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[50], line 1
----> 1 "chase"/2

TypeError: unsupported operand type(s) for /: 'str' and 'int'
#strings can't be combined with integers using division

Exercise. This one’s a little harder. Assign your first name as a string to the variable firstname and your last name to the variable lastname. Use them to construct a new variable equal to your first name, a space, then your last name. Hint: Think about how you would express a space as a string.

firstname = 'Jacob'
lastname = 'Koehler'
fullname = firstname + ' ' + lastname
#or with f strings
fullname = f'{firstname} {lastname}'
print(fullname)
Jacob Koehler

Exercise. Set s = ‘string’. What is s + s? 2s? s2? What is the logic here?

s = 'string'
print(s + s)
print(2*s)
print(s*2)
stringstring
stringstring
stringstring
#addition is concatenating and multiplication is repetition

Quotation Marks#

Here is the thing, you’ll notice that sometimes I use single quotation, double quotation marks…

  • First, both are valid ways to define a string. The real issue is my inconsistent use partly this is a problem within the NYU databoot camp team…I actually prefer double.

  • Second, the fact that both are valid is not an accident, in fact, double quotation marks and even triple quotation marks play an important roles.

a = 'string'
b = "string"
print(a,b) # We should see the same thing....

# This is one instance where double helps...
message = "I don't know what I'm doing"
print(message)
string string
I don't know what I'm doing

Note how in the last line of code I can use the apostrophe. This is the value added of double quotation marks in that it can handle more complicated punctuation within the quoation marks. Now what about this…

longstring = """
Four score and seven years ago
Our fathers brought forth. """

print(longstring) 

Here triple quotation marks allow us to have multiple lines.


Time to practice#

Exercise. In the Four score etc code, replace the triple double quotes with triple single quotes. What happens?

Exercise. Fix this code:

bad_string = 'Sarah's code'

print(bad_string)

Exercise. Which of these are strings? Which are not? Edit the markdown cell and type next to each one string, not string.

  • apple

  • “orange”

  • ‘lemon84’

  • “1”

  • string

  • 4

  • 15.6

  • ‘32.5’


Changing types#

Types matter a lot in python, but sometimes we will want to change the type of a varible. This is something that will come up often in our data work….

s = '12.34' # This is a string (check it to veryify...)

f = float(s) # This builtin function will convert the string to a float

print(type(f)) # It should now tells us that f is a float...

s = "12"

i = int(s) # This should convert the string to an integer...what if we did the 
            # string "12.34"??? 

print(type(i)) # This should be a type integer...

Then we can always convert it back….

s = str(12) # So start with an integer and go to a string...
print('s has type', type(s))

t = str(12.34) # Or start with a float and go to a string as well
print('t has type', type(f))

Big picture This is again a super powerful aspect of python that makes it very applicable for working with data…the ability to go from numbers to strings and back.

# This is cool...start with a string and make it a list by the command list
x = 'abc'

y = list(x)

print(y) # So now y should be a list of a, b, c

Time to practice#

Exercise. What happens if we apply the function float to the string ‘some’?

float('some')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[57], line 1
----> 1 float('some')

ValueError: could not convert string to float: 'some'

Exercise. What is the result of list(str(int(float('12.34'))))? Why? Hint: Start in the middle (the string ‘12.34’) and work your way out, one step at a time. This is similar to question 13 and 14 on Code Practice #1.

#a list with a string 12.34 as the only element

Exercise. How would you convert the integer i = 1234 to the list l = [‘1’, ‘2’, ‘3’, ‘4’]?

i = 1234
i = list(str(i))
i
['1', '2', '3', '4']

Exercise (challenging). This one is tricky, but it came up in some work we were doing. Suppose year is a string containing the year of a particular piece of data; for example, year = ‘2015’. How would we construct a string for the following year?

year = '2015'
nextyear = str(int(year) + 1)
print(nextyear)
print(type(nextyear))
2016
<class 'str'>

Programming Errors#

Fact of life: you will make errors. Many errors. The key to programming is (i) not getting discouraged and living with that fact and (ii) learning how to make sense of error messages and self-correct those errors.

Point (i) is a life skill that takes years to learn. However, we can help you with (ii), below we talk through some very common error messages and how to identify them.

Name Error#

It happens when we use something not defined, it could be a variable or a function. The associated output is an error message that includes (i) what line the issue is occuring in and (ii) the name that could not be found. Here are two examples:

# Using not defined variable
print(NotDefined)
# Another situation, here we are 
# using function that is not defined.
log(3) 

So you see in both of these instances that there is an arrow pointing to the line within each code cell. In the first instance it is pointing to line 2. This is where the issue is. In the second instance, it is pointing to line number 3.

And after pointing to the issue, below that is says NameError: and stuff. In the first instance, it tells us NotDefined is well…not defined. In the second instance, its saying the same thing. It just does not know what log is.

Type Error#

This one happens all the time too. It happens when an inappropriate operation or function is applied to that specific data type. Here are some examples:

x = "2"
y = 2

z = x + y

Like above, it tells us that line number 4 is the issue, i.e. where we are tying to add “2” and 2. And the issue is a type issue, we can’t add two different types (in this case a string and an integer).

Here is another example relating to tuples. Recall that with a tuple, the data type is immutable. That is it can’t be changed. But lets try and change it…

tuple_error=(2,3)
tuple_error[0]=5

Here it says, line 2…there is a problem. A TypeError problem. And what is the specific issue, well the tuple object does not support this kind of operation.

Important A lot of the problams in interperting the error message lies in deciphering the cryptic messages like “‘tuple’ object does not support item assignment” So how do we do this…use google Often the first result will be a question posted to www.stackoverflow.com which is a place for programers to ask and answer questions. This is a good place to be comfortable with and seek help from.

Excercise: In the google search area type “tuple’ object does not support item assignment” What did you find?

Invalid Syntax#

Syntax errors can be detected before your program begins to run. These types of errors are usually typing mistakes (fat fingers), but migth be hard to find out at first. Here we give two examples:

# Define a simple list and let's call the first one in the list
randomlist =[1,8,3,7]
randomlist[0]]

I know this example may seem easy to identify, but imagine when you write a long code like below, it could be hard. Can you find where is missing?

goal_model_data = pandas.concat([train[['HomeTeam','AwayTeam','HomeGoals']].assign(home=1).rename(
                columns={'HomeTeam':'team', 'AwayTeam':'opponent','HomeGoals':'goals'}),
               train[['AwayTeam','HomeTeam','AwayGoals']].assign(home=0).rename(
                columns={'AwayTeam':'team', 'HomeTeam':'opponent','AwayGoals':'goals'}])
# Or when we define a string
bad_string = 'code"

Key Error#

Python raises a KeyError whenever a dict() object is requested (using the format a = adict[key]) and the key is not in the dictionary.

names = {'Dave': 'Backus', 'Chase': 'Coleman', 'Spencer': 'Lyon', 'Glenn': 'Okun'}
names['David']

“No Idea” Errors#

These are errors that you have no idea what is going on. A couple of tips:

  • Ask a friend. While the movie vision of a coder is some guy in a hoody in a dark room by himself, this is not how we work. Working together, as a team, is an important aspect of data analysis and coding in general. So if you have a problem, ask for help. Explain to him/her what you were trying to do (often just this process helps solve the issue) and then show them the output.

  • Google fu Use google. Chances are you are not the first one to have this problem. Just cut and past the error message into google and track down what other people have to say about it.


Summary#

Congratulations! First, it’s amazing that you have made it this far. Reflect on what you knew before working through this notebook. Now reflect on what you can do…AMAZING!!! Let us summarize some key things that we covered.

  • Assignments and variables: We say we assign what’s on the right to the thing on the left: x = 17.4 assigns the number 17.4 to the variable x.

  • Data types and structures:

    • Strings. Strings are collections of characters in quotes: ‘this is a string’.

    • Lists. Lists are collections of things in square brackets: [1, ‘help’, 3.14159].

    • Number types: integers vs. floats. Examples of integers include -1, 2, 5, 42. They cannot involve fractions. Floats use decimal points: 12.34. Thus 2 is an integer and 2.0 is a float.

    • Dictionary. Dictionaries are collections of unordered things in {} with key-value pairs: names = {‘Dave’: ‘Backus’, ‘Chase’: ‘Coleman’}.

  • Built-in functions:

    • The print() function. Use print('something', x) to display the value(s) of the object(s) in parentheses.

    • The type() function. The command type(x) tells us what kind of object x is. Past examples include integers, floating point numbers, strings, and lists.

  • Type conversions:

    • Use str() to convert a float or integer to a string.

    • Use float() or int() to convert a string into a float or integer.

    • Use list() to convert a string to a list of its characters.

  • Error message types:

    • NameError. Usually happens when using something not defined which could be variable or methods.

    • TypeError. Raise when an operation or function is applied to an object of inappropriate type. For example, tuples have no "=" methods while number no len.

    • Invalid syntax. Syntax errors can be detected before your program begins to run. So first thing to do is to check typos, parentheses, etc.

    • KeyError. It happens when you refer a key not previously defined in the dictionary.

Useful Tricks and Programming Tools#

  • Comments. Use the hash symbol # to add comments to your code and explain what you’re doing. Don’t underestimate the importance of creating well commented code. Here are some thoughts on this…

  • Help. We can get help for a function or method foo by typing foo? in the IPython console or foo in the Object explorer. Try each of them with the type() function to remind yourself how this works.

  • Error Messages Look at the message, (i) read where the issue is and (ii) track down what the message is associated with that line. Or ask a friend!