0. Setup WRDS Python API

0. Setup WRDS Python API#

To be done before running this notebook
https://pypi.org/project/wrds/
run “pip install wrds”
Mac: in terminal
PC: in Anaconda prompt
Spyder or Jupyter

########################### In Colab #####################
!pip install wrds

1. Import pre installed WRDS package#

import wrds

2. Establish connection with WRDS server#

log in using your WRDS username and password
set up a pgpass file to store the info

conn = wrds.Connection()  #### 1st time login

conn.close()
conn = wrds.Connection(wrds_username='luojx')

3. List all libraries#

“Library” refers to databases on WRDS: e.g. CRSP, Compustat
list_libraries() function to explore all subscribed databases

conn.list_libraries().sort()  #SORT LIBRARIES A-Z
#conn.list_libraries() # LIST ALL LIBRARIES
#conn.list_libraries()
print("There are "+ str(len(conn.list_libraries()))+" libraries in the list")
#conn.list_libraries()[:100]  #LIST THE FIRST 40 LIBRARIES

There are 317 libraries in the list

4. List all datasets within a given library#

databases contain many sub datasets
list_tables() function to list all datasets
specify which “library/database”

Webpage counterpart: https://wrds-www.wharton.upenn.edu/data-dictionary/comp_na_monthly_all/#

#conn.list_tables(library='comp_na_daily_all')

5. Query Data from WRDS Server#

get_table() method
straightforward if getting data from one single dataset
specify which library/database and table/dataset to “get”
can slice data by:
- number of rows
- column names
https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/sp-global-market-intelligence/

## Extract first 5 obs from comp.company
company = conn.get_table(library='comp_na_daily_all', table='company')
company.shape

company.columns

company.head()

# Narrow down the specific columns to extract

company_narrow = conn.get_table(library='comp_na_daily_all', table='company',
                                columns = ['conm', 'gvkey', 'cik'], obs=5)
company_narrow.shape

(5, 3)

company_narrow

6. Subsetting Dataset#

raw_sql() method
when “conditioning” is needed
familiar SQL syntax
can pre-specify date column format

# Select one stock's monthly price
# from 2019 onwards
# AAPL:14593
# WRDS Linking Table: https://guides.nyu.edu/wrds/linking-suite

apple = conn.raw_sql("""select permno, date, prc, ret, shrout
                        from crsp.msf
                        where permno = 14593
                        and date>='01/01/2019'""",
                     date_cols=['date'])

apple

apple.dtypes

permno             Int64
date      datetime64[ns]
prc              Float64
ret              Float64
shrout           Float64
dtype: object

7. Join Multiple Datasets#

again raw_sql() method
synatx similar to “proc sql” in SAS
handle conditioning statement

apple_fund = conn.raw_sql("""select a.gvkey, a.iid, a.datadate, a.tic, a.conm,
                            a.at, b.prccm, b.cshoq

                            from comp.funda a
                            inner join comp.secm b

                            on a.gvkey = b.gvkey
                            and a.iid = b.iid
                            and a.datadate = b.datadate

                            where a.tic = 'AAPL'
                            and a.datadate>='01/01/2010'
                            and a.datafmt = 'STD'
                            and a.consol = 'C'
                            and a.indfmt = 'INDL'
                            """, date_cols=['datadate'])

apple_fund.shape

(16, 8)

apple_fund

8. Saving Your Output#

Pandas support flexible output format
pickle for further python work
csv or excel

# export the dataframe to csv format
# apple_fund.to_csv('/your local directory/apple_fund.csv')

# export the dataframe to xlsx format
#apple_fund.to_excel('/your local directory/apple_fund.xlsx')

# export the dataframe to dta format for STATA
# apple_fund.to_stata('/your local directory/apple_fund.dta')

#apple_fund.to_csv('/Users/jl14614/Desktop/apple_fund.csv', index=False)
apple_fund.to_csv('apple_fund.csv', index=False)

0. Setup WRDS Python API

Contents

0. Setup WRDS Python API#

1. Import pre installed WRDS package#

2. Establish connection with WRDS server#

3. List all libraries#

4. List all datasets within a given library#

Webpage counterpart: https://wrds-www.wharton.upenn.edu/data-dictionary/comp_na_monthly_all/#

5. Query Data from WRDS Server#

6. Subsetting Dataset#

7. Join Multiple Datasets#

8. Saving Your Output#

The End#