Tutorial: https://wrds-www.wharton.upenn.edu/documents/1443/wrds_connection.html
0. Setup WRDS Python API#
To be done before running this notebook
run “pip install wrds”
Mac: in terminal
PC: in Anaconda prompt
Spyder or Jupyter
########################### In Colab #####################
!pip install wrds
1. Import pre installed WRDS package#
import wrds
2. Establish connection with WRDS server#
log in using your WRDS username and password
set up a pgpass file to store the info
conn = wrds.Connection() #### 1st time login
conn.close()
conn = wrds.Connection(wrds_username='luojx')
3. List all libraries#
“Library” refers to databases on WRDS: e.g. CRSP, Compustat
list_libraries() function to explore all subscribed databases
conn.list_libraries().sort() #SORT LIBRARIES A-Z
#conn.list_libraries() # LIST ALL LIBRARIES
#conn.list_libraries()
print("There are "+ str(len(conn.list_libraries()))+" libraries in the list")
#conn.list_libraries()[:100] #LIST THE FIRST 40 LIBRARIES
There are 317 libraries in the list
4. List all datasets within a given library#
databases contain many sub datasets
list_tables() function to list all datasets
specify which “library/database”
Webpage counterpart: https://wrds-www.wharton.upenn.edu/data-dictionary/comp_na_monthly_all/#
#conn.list_tables(library='comp_na_daily_all')
5. Query Data from WRDS Server#
get_table() method
straightforward if getting data from one single dataset
specify which library/database and table/dataset to “get”
can slice data by:
number of rows
column names
https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/sp-global-market-intelligence/
## Extract first 5 obs from comp.company
company = conn.get_table(library='comp_na_daily_all', table='company')
company.shape
company.columns
company.head()
# Narrow down the specific columns to extract
company_narrow = conn.get_table(library='comp_na_daily_all', table='company',
columns = ['conm', 'gvkey', 'cik'], obs=5)
company_narrow.shape
(5, 3)
company_narrow
6. Subsetting Dataset#
raw_sql() method
when “conditioning” is needed
familiar SQL syntax
can pre-specify date column format
# Select one stock's monthly price
# from 2019 onwards
# AAPL:14593
# WRDS Linking Table: https://guides.nyu.edu/wrds/linking-suite
apple = conn.raw_sql("""select permno, date, prc, ret, shrout
from crsp.msf
where permno = 14593
and date>='01/01/2019'""",
date_cols=['date'])
apple
apple.dtypes
permno Int64
date datetime64[ns]
prc Float64
ret Float64
shrout Float64
dtype: object
7. Join Multiple Datasets#
again raw_sql() method
synatx similar to “proc sql” in SAS
handle conditioning statement
apple_fund = conn.raw_sql("""select a.gvkey, a.iid, a.datadate, a.tic, a.conm,
a.at, b.prccm, b.cshoq
from comp.funda a
inner join comp.secm b
on a.gvkey = b.gvkey
and a.iid = b.iid
and a.datadate = b.datadate
where a.tic = 'AAPL'
and a.datadate>='01/01/2010'
and a.datafmt = 'STD'
and a.consol = 'C'
and a.indfmt = 'INDL'
""", date_cols=['datadate'])
apple_fund.shape
(16, 8)
apple_fund
8. Saving Your Output#
Pandas support flexible output format
pickle for further python work
csv or excel
# export the dataframe to csv format
# apple_fund.to_csv('/your local directory/apple_fund.csv')
# export the dataframe to xlsx format
#apple_fund.to_excel('/your local directory/apple_fund.xlsx')
# export the dataframe to dta format for STATA
# apple_fund.to_stata('/your local directory/apple_fund.dta')
#apple_fund.to_csv('/Users/jl14614/Desktop/apple_fund.csv', index=False)
apple_fund.to_csv('apple_fund.csv', index=False)