Data Bootcamp#

Data Bootcamp is about nuts and bolts data analysis. You will learn how to analyze economic and business data and enough about computer programming to work with it efficiently. We will use Python, a popular high-level computer language that is widely used in finance, consulting, technology, and other parts of the business world. “High-level” means it is less painful than most (the hard work is being done by the language), but it is a serious language with extensive capabilities.

“Data analysis” means descriptions that summarize data in ways that are helpful and informative. “Bootcamp” is a reminder that expertise takes work. Don’t worry, it’s worth it. You will be more valuable to current and future employers. And you will be able to do things more effectively than friends who rely on Excel.

Instructor and TA#

Instructor: Jacob Koehler, PhD

Teaching Assistant: Anusha Khandupar

Requirements#

There are no prerequisites. We welcome students with no prior programming experience and have designed the course with them in mind. What you need is the courage to take on a challenge and the patience to fix computer programs that don’t work. That’s a regular occurrence, even for experts.

Getting help#

This course has a strong support system to help you when you run into a problem – and anyone who codes runs into programs runs into problems. We have myself and a teaching fellow.

You can also ask your classmates for help! You can post questions on our discussion group (on SLACK). The teaching fellow and I will monitor the discussion group.

The bottom line: If you’re stuck, ask for help.

Course Content#

I will continue building out our course book as we go, but in the first half of the course I will also be heavily referencing and taking directly from the text below:

All assignments will be distributed and collected using Brightspace, and class recordings will be available there as well.

Deliverables and grades#

This course divides naturally into two parts. The first part is an introduction to those aspects of the Python programming language useful for data analysis. We cover this material with as many applications to real data as we can think of. The second part covers advanced topics and ends with a project of your own. The goal is for you to have a piece of work you can show potential employers to illustrate your quantitative skill set. Both parts include a number of graded deliverables. The idea is to do some work all the time rather than lots of work once in a while.

Graded work includes:

  • Code Practice: There are eight assignments over the course of the semester. They are a great way to develop your skills and come to the following class with questions about what you may not yet understand. Midterm Project. Midway through the semester, you will be asked to derive unique insight from a dataset of your creation/construction.

  • Final Project: We work our way up to the project one step at a time, starting with idea generation and ending with a professional piece of data collection and analysis that you can share with potential employers. We have found that the quality of final project had a surprisingly low correlation to previous programming experience. A little thought and effort go a long way in creating an interesting project.

Due dates are posted on the course brightspace site.

Dates are not negotiable. Anything handed in late will get a grade of zero.

All your work should be clean and professional. Your grade depends on it.

Final grades will be computed from

  • Code practice 40%

  • Midterm project 15%

  • Final Project 15%

  • In Class Activities and Participation 30%

Final grades are subject to the Stern grading curve, where the approximately 35% of students should receive an A or A-. If you make a good-faith effort, we expect it to be hard to get less than a B. We are the sole judges of what constitutes good-faith effort.

Policies#

Ethics, disabilities, and many other things are governed by NYU and Stern policies. If you have questions about them, please ask. On graded work: You may discuss assignments with anyone (in fact, we encourage it), but anything you submit, including your code, should be your own. Exams should be entirely your own work. On disabilities: If you have a qualified disability that requires academic accommodation, please contact the Moses Center for Students with Disabilities (CSD, 212-998-4980) and ask them to send us a letter verifying your registration and outlining the accommodation they recommend. If you need to take an exam at the CSD, you must submit a completed Exam Accommodations Form to them at least one week prior to the scheduled exam time to be assured accommodation.

Course Outline#

Week

Topics

Assignment

Introduction to Python and Coding Basics

1

Installs and Setup, Bash Basics, Git and Github, Variables, Collections, Control Flow

Homework I

2

NumPy and Pandas

Homework II

3

Accessing Data: API’s and Webscraping

Homework III

Working with Data

4

Pandas and Plotting

Homework IV

5

Probability and Statistics

Midterm Assigned

Predictive Models with Data

6

Linear Regression and Modeling Introduction

Homework V

7

Classification: Logistic Regression and KNN

Homework VI

8

Unsupervised Learning: Clustering

Homework VII

9

Time Series

Homework VIII

Artificial Neural Networks

10

ANN’s for Regression and Classification

Homework IX

11

ANN’s for Text and Language

Homework X

12

Streamlit and Deployment

Final Project Proposal

13

Final Topics

Final Project