Tutorial: https://wrds-www.wharton.upenn.edu/documents/1443/wrds_connection.html

0. Setup WRDS Python API#

  • To be done before running this notebook

  • https://pypi.org/project/wrds/

  • run “pip install wrds”

  • Mac: in terminal

  • PC: in Anaconda prompt

  • Spyder or Jupyter

########################### In Colab #####################
!pip install wrds

1. Import pre installed WRDS package#

import wrds

2. Establish connection with WRDS server#

  • log in using your WRDS username and password

  • set up a pgpass file to store the info

conn = wrds.Connection()  #### 1st time login
conn.close()
conn = wrds.Connection(wrds_username='luojx')

3. List all libraries#

  • “Library” refers to databases on WRDS: e.g. CRSP, Compustat

  • list_libraries() function to explore all subscribed databases

conn.list_libraries().sort()  #SORT LIBRARIES A-Z
#conn.list_libraries() # LIST ALL LIBRARIES
#conn.list_libraries()
print("There are "+ str(len(conn.list_libraries()))+" libraries in the list")
#conn.list_libraries()[:100]  #LIST THE FIRST 40 LIBRARIES
There are 317 libraries in the list

4. List all datasets within a given library#

  • databases contain many sub datasets

  • list_tables() function to list all datasets

  • specify which “library/database”

Webpage counterpart: https://wrds-www.wharton.upenn.edu/data-dictionary/comp_na_monthly_all/#

#conn.list_tables(library='comp_na_daily_all')

5. Query Data from WRDS Server#

## Extract first 5 obs from comp.company
company = conn.get_table(library='comp_na_daily_all', table='company')
company.shape
company.columns
company.head()
# Narrow down the specific columns to extract

company_narrow = conn.get_table(library='comp_na_daily_all', table='company',
                                columns = ['conm', 'gvkey', 'cik'], obs=5)
company_narrow.shape
(5, 3)
company_narrow

6. Subsetting Dataset#

  • raw_sql() method

  • when “conditioning” is needed

  • familiar SQL syntax

  • can pre-specify date column format

# Select one stock's monthly price
# from 2019 onwards
# AAPL:14593
# WRDS Linking Table: https://guides.nyu.edu/wrds/linking-suite

apple = conn.raw_sql("""select permno, date, prc, ret, shrout
                        from crsp.msf
                        where permno = 14593
                        and date>='01/01/2019'""",
                     date_cols=['date'])
apple
apple.dtypes
permno             Int64
date      datetime64[ns]
prc              Float64
ret              Float64
shrout           Float64
dtype: object

7. Join Multiple Datasets#

  • again raw_sql() method

  • synatx similar to “proc sql” in SAS

  • handle conditioning statement

apple_fund = conn.raw_sql("""select a.gvkey, a.iid, a.datadate, a.tic, a.conm,
                            a.at, b.prccm, b.cshoq

                            from comp.funda a
                            inner join comp.secm b

                            on a.gvkey = b.gvkey
                            and a.iid = b.iid
                            and a.datadate = b.datadate

                            where a.tic = 'AAPL'
                            and a.datadate>='01/01/2010'
                            and a.datafmt = 'STD'
                            and a.consol = 'C'
                            and a.indfmt = 'INDL'
                            """, date_cols=['datadate'])

apple_fund.shape
(16, 8)
apple_fund

8. Saving Your Output#

  • Pandas support flexible output format

  • pickle for further python work

  • csv or excel

# export the dataframe to csv format
# apple_fund.to_csv('/your local directory/apple_fund.csv')

# export the dataframe to xlsx format
#apple_fund.to_excel('/your local directory/apple_fund.xlsx')

# export the dataframe to dta format for STATA
# apple_fund.to_stata('/your local directory/apple_fund.dta')

#apple_fund.to_csv('/Users/jl14614/Desktop/apple_fund.csv', index=False)
apple_fund.to_csv('apple_fund.csv', index=False)

The End#