Midterm Project#

Your midterm project will be a thematic exploration of a dataset of your choice/construction. To begin, you are to identify a topical area of interest and whether or not you would be interested in working with a small group of your peers here.

Determining the Data Sources#

For your data, you are to either build a dataset utilizing an API or identifying a website or sites to scrape. These should be related to your topic of interest. As a starting point, each group member should identify 2-3 resources in the form of an API or website(s) that look like they might be useful for your exploration. Next class you will have time as a group to drill down specifics from the options you bring to the group.

Exploratory Questions#

You should come up with a few driving questions that you believe the data will give you insight into. This is very general, and I give some examples of projects that I believe would be meaningful. These are likely to change a bit as you dig into your data but work to be as specific about feature names and relationships or differences to be explored.

Some Example Tasks#

  • EXAMPLE 1: A company has hired your group as a consultant to explore market competition in the skateboard industry. They are interested in opening a new retail location that also maintains an online presence for ecommerce.

    • Data Source: Local retailer websites (depending on geography) and national ecommerce retailers (CCS and Zumiez)

    • Driving Questions:

      • What is the weight of inventory based on product types: shoes, pants, shirts, hats, socks, boards, trucks, wheels, bearings, and accessories

      • What is the brand distribution for each retailer – how many different brands do they carry and what categories do they dominate?

      • What is the price distribution by brand for each retailer and in total?

  • EXAMPLE 2: You and some friends want to start a small investment fund where you pool your resources to make some basic investments in publicly traded companies. You aim to use some basic balance sheet and cash flow analysis to evaluate the companies investment potential.

    • Data Source: yfinance library and sec api

    • Problem Statement:

      • Generalize a Valuation strategy and implement this on a selection of publicly listed companies

      • Identify top candidates for investment based on your valuations

      • Modularize the valuation strategy with functions or classes that can be reused for later analysis

  • EXAMPLE 3: A small real estate development company has engaged your group to explore the real estate market in the capital district of New York.

    • Data Source: MLS listings via Zillow’s API

    • Driving Questions:

      • What kind of homes have been dominating the marketplace?

      • How do prices compare by municipality?

      • What features make the biggest difference in home prices?