Extracting Data From HTML#

OBJECTIVES

  • Use pd.read_html to extract data from website tables

  • Use bs4 to parse html returned with requests.

Reading in Data from HTML Tables#

Now, we turn to one more approach in accessing data. As we’ve seen, you may have json or csv when querying a data API. Alternatively, you may receive HTML data where information is contained in tags. Below, we examine some basic html tags and their effects.

<h1>A Heading</h1>
<p>A first paragraph</p>
<p>A second paragraph</p>
<table>
  <tr>
    <th>Album</th>
    <th>Rating</th>
  </tr>
  <tr>
    <td>Pink Panther</td>
    <td>10</td>
  </tr>
</table>
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import requests
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 4
      2 import pandas as pd
      3 import matplotlib.pyplot as plt
----> 4 import seaborn as sns
      6 import requests

ModuleNotFoundError: No module named 'seaborn'
html = '''
<h1>A Heading</h1>
<p>A first paragraph</p>
<p>A second paragraph</p>
<table>
  <tr>
    <th>Album</th>
    <th>Rating</th>
  </tr>
  <tr>
    <td>Pink Panther</td>
    <td>10</td>
  </tr>
</table>
'''
from IPython.display import HTML
HTML(html)

A Heading

A first paragraph

A second paragraph

Album Rating
Pink Panther 10

Making a request of a url#

Let’s begin with some basketball information from basketball-reference.com:

The tables on the page will be picked up (hopefully!) by the read_html function in pandas.

#visit the url below
url = 'https://www.basketball-reference.com/wnba'
#assign the results as data
#read_html
wnba = pd.read_html(url)
#what kind of object is data?
type(wnba)
list
#first element?
wnba[0]
Team W L W/L% GB
0 New York Liberty* 32 8 0.800
1 Minnesota Lynx* 30 10 0.750 2.0
2 Connecticut Sun* 28 12 0.700 4.0
3 Las Vegas Aces* 27 13 0.675 5.0
4 Seattle Storm* 25 15 0.625 7.0
5 Indiana Fever* 20 20 0.500 12.0
6 Phoenix Mercury* 19 21 0.475 13.0
7 Atlanta Dream* 15 25 0.375 17.0
8 Washington Mystics 14 26 0.350 18.0
9 Chicago Sky 13 27 0.325 19.0
10 Dallas Wings 9 31 0.225 23.0
11 Los Angeles Sparks 8 32 0.200 24.0
#examine information
wnba[0].info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Team    12 non-null     object 
 1   W       12 non-null     int64  
 2   L       12 non-null     int64  
 3   W/L%    12 non-null     float64
 4   GB      12 non-null     object 
dtypes: float64(1), int64(2), object(2)
memory usage: 608.0+ bytes
#last dataframe?
wnba[-1]
Unnamed: 0 PTS TRB AST GmSc
0 A'ja Wilson (LVA) 24 7 4 20.5
1 Sabrina Ionescu (NYL) 24 9 5 19.7
2 Alyssa Thomas (CON) 18 10 7 19.3
3 DeWanna Bonner (CON) 17 6 3 16.0
4 Alanna Smith (MIN) 15 6 2 15.7

Example 2

List of best selling albums from Wikipedia.

url = 'https://en.wikipedia.org/wiki/List_of_best-selling_albums'
#read in the tables
#how many tables?
#look at the fourth table
#try to convert sales to float
#replace and coerce as float
# fourth_table['Claimed sales*'] = fourth_table['Claimed sales*'].replace({'20[disputed – discuss]': 20}).astype('float')
#alternative with string method
#fourth_table['Claimed sales*'].str.replace('[disputed – discuss]', '', regex = False)

Scraping the Web for Data#

Sometimes the data is not formatted as an html table or pd.read_html simply doesn’t work. In these situations you can use the bs4 library and its BeautifulSoup object to parse HTML tags and extract information. First, make sure you have the library installed and can import it below.

# pip install -U bs4
from bs4 import BeautifulSoup
import requests
sample_html = '''
<h1>Music Reviews</h1>
<p>This album was awful. <strong>Score</strong>: <i class = "score">2</i></p>
<p class = "good">This album was great. <strong>Score</strong>: <i class = "score">8</i></p>
'''
# create a soup object
soup = BeautifulSoup(sample_html)
# examine the soup
soup
<html><body><h1>Music Reviews</h1>
<p>This album was awful. <strong>Score</strong>: <i class="score">2</i></p>
<p class="good">This album was great. <strong>Score</strong>: <i class="score">8</i></p>
</body></html>
# find the <p> tags
soup.find('p')
<p>This album was awful. <strong>Score</strong>: <i class="score">2</i></p>
# find the i tag
soup.find('i')
<i class="score">2</i>
# find all the i tags
soup.find_all('i')
[<i class="score">2</i>, <i class="score">8</i>]
# find all good paragraphs
soup.find('p', {'class': 'good'})
<p class="good">This album was great. <strong>Score</strong>: <i class="score">8</i></p>

Extracting Data from a URL#

  1. Make a request.

  2. Turn the request into soup!

url = 'https://pitchfork.com/reviews/albums/'
#make a request
r = requests.get(url)
#examine the text
r.text[:1000]
'<!DOCTYPE html><html lang="en-US"><head><title>New Albums &amp; Music Reviews | Pitchfork</title><meta charSet="utf-8"/><meta content="IE=edge" http-equiv="X-UA-Compatible"/><meta name="msapplication-tap-highlight" content="no"/><meta name="viewport" content="width=device-width, initial-scale=1"/><meta name="author" content="Condé Nast"/><meta name="copyright" content="Copyright (c) Condé Nast 2024"/><meta name="description" content="Daily reviews of every important album in music"/><meta name="id" content="65ce02a52126d093a5f585e1"/><meta name="keywords" content="web"/><meta name="news_keywords" content="web"/><meta name="robots" content="index, follow, max-image-preview:large"/><meta name="content-type" content="bundle"/><meta name="parsely-post-id" content="65ce02a52126d093a5f585e1"/><meta name="parsely-metadata" content="{&quot;description&quot;:&quot;Daily reviews of every important album in music&quot;,&quot;image-16-9&quot;:&quot;https://media.pitchfork.com/photos/5935a027a28a09'
#turn it into soup!
soup = BeautifulSoup(r.text)

Using Inspect#

You can inspect an items HTML code by right clicking on the item of interest and selecting inspect. Here, you will see the html tags that surround the object of interest.

For example, when writing this lesson a recent album review on pitchfork was Mustafa: Dunya. Right clicking on the image of the album cover and choosing inspect showed:

#find the img tag
dunya = soup.find('img', {'alt': 'Dunya'})
dunya.attrs['src']
'https://media.pitchfork.com/photos/668fec739c03086dcec412d6/1:1/w_1600%2Cc_limit/Mustafa-Dunya.jpg'
#find all img tags
images = soup.find_all('img')
#explore attributes
images[0].attrs
{'alt': 'Pitchfork',
 'class': ['ResponsiveImageContainer-eybHBd',
  'fptoWY',
  'responsive-image__image'],
 'src': '/verso/static/pitchfork/assets/logo-inverted.svg',
 'srcset': '',
 'sizes': '100vw'}
#extract source of image url
[img.attrs['src'] for img in images]
['/verso/static/pitchfork/assets/logo-inverted.svg',
 '/verso/static/pitchfork/assets/logo-header.svg',
 'https://media.pitchfork.com/photos/66a3cf7aeca3501f5dc9b121/1:1/w_1600%2Cc_limit/Being%2520Dead-%2520EELS.jpg',
 'https://media.pitchfork.com/photos/66fc0c553dcae43f31bfd01c/1:1/w_1600%2Cc_limit/2300%2520-%2520Bully%2520Tape.jpeg',
 'https://media.pitchfork.com/photos/66f2da330eece3c05910cb10/1:1/w_1600%2Cc_limit/Raphael%2520Raginski%2520-%2520Plays%2520John%2520Coltrane%2520and%2520Langston%2520Hughes.jpeg',
 'https://media.pitchfork.com/photos/668fec739c03086dcec412d6/1:1/w_1600%2Cc_limit/Mustafa-Dunya.jpg',
 'https://media.pitchfork.com/photos/66e07055506fec54a6686125/1:1/w_1600%2Cc_limit/Adeline-Hotel-Whodunnit.jpg',
 'https://media.pitchfork.com/photos/66ed8ef4a29561bba8d0bd0f/1:1/w_1600%2Cc_limit/Tommy%2520Richman%2520-%2520Coyote.jpg',
 'https://media.pitchfork.com/photos/66ed9384d74ab9c23d17f237/1:1/w_1600%2Cc_limit/Merce%2520Lemon%2520-%2520Watch%2520Me%2520Drive%2520Them%2520Dogs%2520Wild.jpg',
 'https://media.pitchfork.com/photos/6695907214ee05489f0a592f/1:1/w_1600%2Cc_limit/Alan-Sparhawk-White-Roses-My-God.jpg',
 'https://media.pitchfork.com/photos/66f2de67b80b64aad7496cb9/1:1/w_1600%2Cc_limit/Shinichi%2520Atobe%2520-%2520Peace%2520of%2520mind.jpeg',
 'https://media.pitchfork.com/photos/66ed9b9194b1bace5b046fb0/1:1/w_1600%2Cc_limit/Silver_Jews_-_The_Natural_Bridge-transformed.jpeg',
 'https://media.pitchfork.com/photos/66ed7776ebeaf9cbab4d800f/1:1/w_1600%2Cc_limit/Hopecore.jpg',
 'https://media.pitchfork.com/photos/6679af9454456585e9dbc087/1:1/w_1600%2Cc_limit/SOPHIE.jpg',
 'https://media.pitchfork.com/photos/66f2d1d7a2d52e3bd3b6bcc4/1:1/w_1600%2Cc_limit/Monaleo%2520-%2520Throwing%2520Bows.jpg',
 'https://media.pitchfork.com/photos/668c14ffefacdaa8de5824b9/1:1/w_1600%2Cc_limit/The-Voidz-Like-All-Before-You.jpg',
 'https://media.pitchfork.com/photos/66ed884649ec49f6714c8018/1:1/w_1600%2Cc_limit/Ulla%2520&%2520Perila%2520-%2520Jazz%2520Plates.jpg',
 'https://media.pitchfork.com/photos/66e1f20d68d4d8de4a52fd1d/1:1/w_1600%2Cc_limit/Future-Mixtape-Pluto.jpg',
 'https://media.pitchfork.com/photos/66e86464f23f66ab90554188/1:1/w_1600%2Cc_limit/che-Sayso%2520Says.jpg',
 'https://media.pitchfork.com/photos/668ec1aeb6c41f622a91ae69/1:1/w_1600%2Cc_limit/Katy-Perry-143.jpg',
 'https://media.pitchfork.com/photos/66a27f95d2d4a2270b9cb5d7/1:1/w_1600%2Cc_limit/Tom%2520Verlaine-%2520Warm%2520and%2520Cool.jpg',
 'https://media.pitchfork.com/photos/666af7bc5d7882749b058940/1:1/w_1600%2Cc_limit/Nubya-Garcia-Odyssey.jpg',
 'https://media.pitchfork.com/photos/66ec250f53bc2780ce12046a/1:1/w_1600%2Cc_limit/Isik%2520Kural%2520-%2520Moon%2520in%2520Gemini.jpg',
 'https://media.pitchfork.com/photos/66ec27cea85b31e0659f5ccb/1:1/w_1600%2Cc_limit/Johnny%2520Foreigner%2520-%2520How%2520to%2520Be%2520Hopeful.jpg',
 'https://media.pitchfork.com/photos/66e87948f23f66ab90554195/1:1/w_1600%2Cc_limit/Garbage%2520-%2520Version%25202.0.jpg',
 'https://media.pitchfork.com/photos/66e2039c9cce584bc959aafa/1:1/w_1600%2Cc_limit/Dame-Area.jpg',
 'https://media.pitchfork.com/photos/665e130c1a9f5a511da29b3d/1:1/w_1600%2Cc_limit/Jamie-xx-In-Waves.jpg',
 'https://media.pitchfork.com/photos/66d76772f78476688e2139da/1:1/w_1600%2Cc_limit/Laila-Gap-Year.jpg',
 'https://media.pitchfork.com/photos/66e2040b9cce584bc959aafc/1:1/w_1600%2Cc_limit/Estradas.jpg',
 'https://media.pitchfork.com/photos/66ec7de26d243f5bdfbec9ec/1:1/w_1600%2Cc_limit/The-War-on-Drugs-Live-Drugs-Again.jpg',
 'https://media.pitchfork.com/photos/66aa7ece3a12b5a1a765e9f5/1:1/w_1600%2Cc_limit/Wendy%2520Eisenberg-%2520Viewfinder.jpg',
 'https://media.pitchfork.com/photos/6667328fd69e5b51a794a637/1:1/w_1600%2Cc_limit/Nilufer%2520Yanya%2520-%2520My%2520Method%2520Actor.jpg',
 'https://media.pitchfork.com/photos/66436f08f556da1fd706739d/1:1/w_1600%2Cc_limit/Porches-Shirt.jpg',
 'https://media.pitchfork.com/photos/66e871d6f23f66ab90554192/1:1/w_1600%2Cc_limit/Callahan%2520&%2520Witshcer%2520-%2520Think%2520Differently.jpg',
 'https://media.pitchfork.com/photos/66ab7bbc3ac8c632c50feb19/1:1/w_1600%2Cc_limit/Foxing-2024.jpg',
 'https://media.pitchfork.com/photos/66e832700bae50181a2f5715/1:1/w_1600%2Cc_limit/BASIC-This%2520is%2520BASIC.jpg',
 'https://media.pitchfork.com/photos/66d9ab99af0da7107ffd402b/1:1/w_1600%2Cc_limit/Hayden-Pedigo-Live-in-Amarillo-Texas.jpg',
 'https://media.pitchfork.com/photos/66e200fd920f76a9ad47c09f/1:1/w_1600%2Cc_limit/Julie.jpg',
 'https://media.pitchfork.com/photos/66e2026d3edec531b52b09e4/1:1/w_1600%2Cc_limit/Phiik-Lungs-Carrot-Season.jpg',
 'https://media.pitchfork.com/photos/66d9aafa568054f0dfb3dd50/1:1/w_1600%2Cc_limit/Chow-Lee.jpg',
 'https://media.pitchfork.com/photos/66e312377fdd09a871ccf75c/1:1/w_1600%2Cc_limit/Basic-Channel-BCD.jpg',
 'https://media.pitchfork.com/photos/66d77820e12524877f7deb7b/1:1/w_1600%2Cc_limit/Travis-Scott-Days-Before-Rodeo.jpg',
 'https://media.pitchfork.com/photos/668c4b3d2b6494be34f5e76f/1:1/w_1600%2Cc_limit/Floating-Points-Cascade.jpg',
 'https://media.pitchfork.com/photos/66aa80271f0846163b1acc2d/1:1/w_1600%2Cc_limit/Allegra%2520Krieger-%2520Art%2520of%2520the%2520Unseen%2520Infinity%2520Machine.jpg',
 'https://media.pitchfork.com/photos/66d76ca40149389e6ca7bb1b/1:1/w_1600%2Cc_limit/Migratory.jpg',
 'https://media.pitchfork.com/photos/665746837cfbcbc000644803/1:1/w_1600%2Cc_limit/Max-Richter-In-a-Landscape.jpg',
 'https://media.pitchfork.com/photos/66db6a17e721049ebffb5689/1:1/w_1600%2Cc_limit/Fat_Dog_WOOF_art.png',
 'https://media.pitchfork.com/photos/6696dabe17254c02f526cff9/1:1/w_1600%2Cc_limit/Nala-Sinephro-Endlessness.jpg',
 'https://media.pitchfork.com/photos/66d768dce12524877f7deb77/1:1/w_1600%2Cc_limit/Dummy-Free-Energy.jpg',
 'https://media.pitchfork.com/photos/6669aac2753b6dae8bc68f32/1:1/w_1600%2Cc_limit/Toro-y-Moi-Hole-Erth.jpg',
 'https://media.pitchfork.com/photos/66a92c0c773c5da2e0d5d881/1:1/w_1600%2Cc_limit/Okay%2520Kaya-%2520Oh%2520My%2520God%2520-%2520That%25E2%2580%2599s%2520So%2520Me.jpg',
 'https://media.pitchfork.com/photos/66db63a91ddc3876928762cb/1:1/w_1600%2Cc_limit/GAS_GAS_self-titled_art.jpg',
 'https://media.pitchfork.com/photos/665e0847ea2d6c6ab24a412a/1:1/w_1600%2Cc_limit/Mercury-Rev-Born-Horses.jpg',
 'https://media.pitchfork.com/photos/66d76862bf3587b9878e8a02/1:1/w_1600%2Cc_limit/The-Dare.jpg',
 'https://media.pitchfork.com/photos/66d76d06d11a415202939982/1:1/w_1600%2Cc_limit/Molchat-Doma-Belaya-Polosa.jpg',
 'https://media.pitchfork.com/photos/66cf5b812501b9b60e2b1ffd/1:1/w_1600%2Cc_limit/Cold-Gawd.jpg',
 'https://media.pitchfork.com/photos/66db1ccf925c8f190079e12d/1:1/w_1600%2Cc_limit/The%2520Pogues_Rum_Sodomy_and_the_Lash_high_res_art.jpg',
 'https://media.pitchfork.com/photos/66d08caf3d7b314e51c074ae/1:1/w_1600%2Cc_limit/Duster-In-Dreams.jpg',
 'https://media.pitchfork.com/photos/6678bbeb08c7a1384158e8c0/1:1/w_1600%2Cc_limit/MJ%2520Lenderman%2520-%2520Manning%2520Fireworks%2520Album%2520Art.jpg',
 'https://media.pitchfork.com/photos/66d9e5b8aa62241a46d71d0f/1:1/w_1600%2Cc_limit/Fcukers_Baggy$$_EP.png',
 'https://media.pitchfork.com/photos/66d1c2eafd9e0a487cca0d48/1:1/w_1600%2Cc_limit/Destroy-Lonely.jpg',
 'https://media.pitchfork.com/photos/66d070068a7ef80640569e23/1:1/w_1600%2Cc_limit/Doechii.jpg',
 'https://media.pitchfork.com/photos/66cf5bfff3807d38ff64519f/1:1/w_1600%2Cc_limit/Why-Bonnie.jpg',
 'https://media.pitchfork.com/photos/65e8a8030b535fdbb08147cd/1:1/w_1600%2Cc_limit/Nick-Cave-Wild-God.jpg',
 'https://media.pitchfork.com/photos/66708140dc443478fe78e69f/1:1/w_1600%2Cc_limit/Laurie-Anderson-Amelia.jpg',
 'https://media.pitchfork.com/photos/66d7230ff78476688e21398e/1:1/w_1600%2Cc_limit/Peel%2520Dream%2520Magazine%2520-%2520Rose%2520Main%2520Reading%2520Room.jpg',
 'https://media.pitchfork.com/photos/662fb40945ecfc72f1b01caf/1:1/w_1600%2Cc_limit/jonhopkins_RITUAL_3000x3000.jpg',
 'https://media.pitchfork.com/photos/66cf5a815008c27a59d6ebb8/1:1/w_1600%2Cc_limit/Lia-Kohl-Normal-Sounds.jpg',
 'https://media.pitchfork.com/photos/66d0e144cdf1dcd025458e3e/1:1/w_1600%2Cc_limit/1tbsp_megacity1000.jpg',
 'https://media.pitchfork.com/photos/666094ca71020266169e6400/1:1/w_1600%2Cc_limit/Ween-Chocolate-and-Cheese.jpg',
 'https://media.pitchfork.com/photos/66d08d963d7b314e51c074b0/1:1/w_1600%2Cc_limit/Dorothy_Carter_Troubadour_artwork.png',
 'https://media.pitchfork.com/photos/66a00d6327bd862f0b38ccd1/1:1/w_1600%2Cc_limit/Seefeel-Everything-Squared.jpg',
 'https://media.pitchfork.com/photos/66cf585a22a3cca5e27fe0c9/1:1/w_1600%2Cc_limit/Coco-and-Clair-Clair-Girl.jpg',
 'https://media.pitchfork.com/photos/66a29b2b15251dbe80c29be2/1:1/w_1600%2Cc_limit/Paris%2520Paloma-%2520Cacophony.png',
 'https://media.pitchfork.com/photos/66732cd8370ad66cfda70cd6/1:1/w_1600%2Cc_limit/The-Softies.jpg',
 'https://media.pitchfork.com/photos/66c490c0539f3919f01513cd/1:1/w_1600%2Cc_limit/Jaeychino-Watch-the-Throne.jpg',
 'https://media.pitchfork.com/photos/66c49e112646d3896f0b8315/1:1/w_1600%2Cc_limit/Ka.jpg',
 'https://media.pitchfork.com/photos/66699e711496d5ecbb675938/1:1/w_1600%2Cc_limit/Spirit-of-the-Beehive-Youll-Have-to-Lose-Something.jpg',
 'https://media.pitchfork.com/photos/66605f2a74fae9996708edd2/1:1/w_1600%2Cc_limit/Illuminati-Hotties-Power.jpg',
 'https://media.pitchfork.com/photos/661d440bfcf3e483157e6e29/1:1/w_1600%2Cc_limit/Body-Meat-Starchris.jpg',
 'https://media.pitchfork.com/photos/66ccd64436fd3358189ce973/1:1/w_1600%2Cc_limit/Etelin_Patio_User_Manual_artwork.jpg',
 'https://media.pitchfork.com/photos/66c49e1f076eb340c39090c8/1:1/w_1600%2Cc_limit/J-Mamana.jpg',
 'https://media.pitchfork.com/photos/6662e6122ce1ce711b18e944/1:1/w_1600%2Cc_limit/Sabrina-Carpenter-Short-n-Sweet.jpg',
 'https://media.pitchfork.com/photos/668ee334d1a4b2217f3f7bd1/1:1/w_1600%2Cc_limit/The-Get-Up-Kids.jpg',
 'https://media.pitchfork.com/photos/667afd592d8a77096d033bb4/1:1/w_1600%2Cc_limit/Heems-Veena-LP.jpg',
 'https://media.pitchfork.com/photos/66c8ff29c841f10b10b5d408/1:1/w_1600%2Cc_limit/soundbombing-II.png',
 'https://media.pitchfork.com/photos/66b147c6bce7f1d9f43d407e/1:1/w_1600%2Cc_limit/Throbbing%2520Gristle-%2520TGCD1.jpg',
 'https://media.pitchfork.com/photos/668ed2fafefe1050fc3c42d2/1:1/w_1600%2Cc_limit/Magdalena-Bay-Imaginal-Disk.jpg',
 'https://media.pitchfork.com/photos/66bca3a094b449df62a0f729/1:1/w_1600%2Cc_limit/Play-Cash-Cobain.jpg',
 'https://media.pitchfork.com/photos/6674467756eeca15d8870072/1:1/w_1600%2Cc_limit/Fake%2520Fruit%2520-%2520Mucho%2520Mistrust.jpeg',
 'https://media.pitchfork.com/photos/661fd84fb341975f564d5886/1:1/w_1600%2Cc_limit/FontainesDC_Romance_4000x40002.jpg',
 'https://media.pitchfork.com/photos/66994b3db9519fc29d58ac1e/1:1/w_1600%2Cc_limit/Gillian-Welch-David-Rawlings-Woodland.jpg',
 'https://media.pitchfork.com/photos/6630765760179481fc71a591/1:1/w_1600%2Cc_limit/Charly%2520Bliss%2520-%2520FOREVER%2520_%2520Album%2520Art.jpg',
 'https://media.pitchfork.com/photos/66c4bf799c2fa1203ad17aee/1:1/w_1600%2Cc_limit/Rosie_Lowe_Lover_Other_artwork.jpg',
 'https://media.pitchfork.com/photos/66bfa5c86cc8a418447da691/1:1/w_1600%2Cc_limit/Charley-Crockett-10-Cowboy.jpg',
 'https://media.pitchfork.com/photos/6674701fd1dae617f0a0d477/1:1/w_1600%2Cc_limit/Post-Malone-F-1-Trillion.jpg',
 'https://media.pitchfork.com/photos/66b144965298828208bc7a28/1:1/w_1600%2Cc_limit/Chuck%2520Johnson-%2520Sun%2520Glories.jpg',
 'https://media.pitchfork.com/photos/66a92786ec557b4af1433e1b/1:1/w_1600%2Cc_limit/Delicate%2520Steve-%2520Delicate%2520Steve%2520Sings.png',
 '/verso/static/pitchfork/assets/logo-reverse.svg']
# extract the genre tags
# extract the text from the genres

PROBLEM

Use the url below to the npr book review site. Make a request, turn this into a soup object, and use the inspect tool to locate the title of each article on the page.

url = 'https://www.npr.org/sections/book-reviews/'

Summary#

There are many ways you may get data – a file that somebody shares with you, data obtained through an API, data obtained through scraping and crawling websites, and even more like a database that you connect to. Now that you’ve got some basics with both data accession, cleaning, munging, and visualizing – it’s time to explore a dataset and ask your own questions.