API’s and Data Visualization#

This week we introduced working with an API to access data, and some additional plotting functionality through the seaborn library. In the assignment, you will extract data from an API and use matplotlib and seaborn to visualize the data. Alternatively you can use bokeh for an extra challenge.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 4
      2 import numpy as np
      3 import matplotlib.pyplot as plt
----> 4 import seaborn as sns
      5 import requests

ModuleNotFoundError: No module named 'seaborn'

NOTE: You will need to sign up for Alpha Vantage’s API and receive a key. Also, you will need to navigate the documentation for the specified time series or sentiment data.

Problem 1

Make sure to sign up for a new API Key from Alpha Vantage here. Assign this key to the variable api_key below.

api_key = ''

Problem 2

Extract the TIME_SERIES_DALY for Tesla and GM for years 2019 - present. Draw side by side line plots using matplotlib. Add appropriate titles and labels, adjust the figure size to (20, 5).

base_url = 'https://www.alphavantage.co/query'
req = requests.get(
    base_url,
    params={
        "function": "",
        "symbol": "",
        "apikey": ""
    }
)

Problem 3

Extract the TIME_SERIES_MONTHLY for the Home Depot and Lowes. Create a boxplot using seaborn where the \(x\)-axis is the month, and the \(y\)-axis is the closing price of each stock respectively.

base_url = 'https://www.alphavantage.co/query'
req = requests.get(
    base_url,
    params={
        "function": "",
        "symbol": "",
        "apikey": ""
    }
)

Problem 4

Extract the NEWS_SENTIMENT for 200 articles related to Tesla stock. Create a histogram of the sentiment scores from each article. This boils down to extracting the overall_sentiment_score from each entry and plotting the results!

#update the parameters
base_url = 'https://www.alphavantage.co/query'
req = requests.get(
    base_url,
    params={
        
    }
)
results = req.json()

Problem 5

Extract data related to retail sales from the last decade. Create a side by side line plot and a boxplot for each month.

#update the parameters
base_url = 'https://www.alphavantage.co/query'
req = requests.get(
    base_url,
    params={
        
    }
)
results = req.json()

Problem 6

Extract REAL_GDP_PER_CAPITA and fix the data so as to have a datetime index sorted from earliest to latest date. Create a line plot using seaborn with appropriate labels and titles.

Problem 7

Use the yfinance library to extract balance sheet data from two companies and determine the better investment using criteria of your choice.

Problem 8

Using the yfinance library, extract data for five ticker symbols from 2018 through present. Create a grid of scatterplots with a regression line using seaborns regplot of the different tickers closing prices.

Problem 9

Read through the documentation on the resample method in pandas here. Use the resample method to extract the first closing price of the month for Apple stock since 2012.

Problem 10

Read through the user guide on the rolling methods in pandas here. Use this to create side by side line plots of the closing price of NVIDIA stock since 2018 and the rolling 20 day mean for the closing price of NVIDIA. What effect does the rolling mean have on the plot?

Problem 11

arXiv is an open source space for academic papers to be published. They have a freely accessible API here. In order to parse the responses, you will need to use the BeautifulSoup library and turn the text of the response into a soup object that is then searched.

Your objective is to write a function that takes in a search term and returns a DataFrame with the article date, title, authors, summary, and article url as columns of the DataFrame.

def arXiv_data(search_terms):
    pass

Problem 12

The world bank has a Python wrapper for its api called wbgapi. Examine the documentation here and chose an endpoint(s) to query. Find at least two endpoints of interest and create visualizations of this data. Write a sentence or two about what you’ve found.

Problem 13

Use requests and BeautifulSoup to scrape and format the most recent 100 album reviews from Pitchfork. Create a DataFrame that includes the album, artist, genre, reviewer, score, and review text for each of these albums. Write your DataFrame to a .csv file called pitchfork_reviews.csv.

HINT: An important part of this will be to extract a url to the full review and use it to make another request from which you can pull the score and review text.

Problem 14

Find an api of interest to you. Ask a specific question that you want to use the data from the api to answer, make an appropriate request of the endpoints and do your best to provide an answer to your question asked.


For example, maybe I’m interested in finding out recent artists similar to Rod Stewart. I could use the LastFM api for this. Perhaps you’re interested in a lyrical analysis of Drake vs. Kendrick Lamar – and want to compare the lexical diversity of different tracks; you can use the genius api for this. Maybe I want to build an app to show a random cat picture with a dad joke. The cat api and jokes api might work here.

Problem 15

Use the praw api here to extract posts from the r/nyu subreddit. What posts are getting the most activity?

BONUS

Using the Dog API, create a 2 X 5 grid of images of random dogs. You will need to create subplots and you can use the axes .imshow() method.