Skip to content

Commit

Permalink
Merge pull request #10 from junder873/remove-functions-moved-to-Abnor…
Browse files Browse the repository at this point in the history
…malReturns.jl

Remove methods and components that are moved to AbnormalReturns.jl
  • Loading branch information
junder873 committed Mar 29, 2022
2 parents f6e5ad8 + 326e7fd commit 4087aea
Show file tree
Hide file tree
Showing 7 changed files with 2 additions and 1,010 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "WRDSMerger"
uuid = "59d27aa3-834e-4232-9046-52ef43e86786"
authors = ["junder873 <junder873@gmail.com>"]
version = "0.3.6"
version = "0.4.0"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
Expand Down
144 changes: 1 addition & 143 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,149 +113,7 @@ This will check that the Cusip is valid (at least according to the checksum, not

## Calculating Abnormal Returns and Other Return Statistics

Another common task is calculating abnormal returns around a firm event, such as an earnings announcement or when a firm enters the S&P 500. This package provides a variety of functions to calculate those.

### Caching Firm and Market Data

To make the calculations as fast as possible, this package relies on cached data to quickly access the return data. This package provides functions to save the data in data types similar to [BusinessDays.jl's](https://github.com/JuliaFinance/BusinessDays.jl) cached data, which makes accessing a range of data incredibly quick. Using GroupDataFrame and filtering, it took 10+ minutes to run a large number (175,000) of regressions. Using this cached data, the same regressions took less than 3 seconds.

It is recommended to load market return data first so that the return dates for the firm data can be checked as valid dates. To load the market data into the cache, run:
```julia
MarketData(df_market_data)
```
Or, if you want to load the data from WRDS:
```julia
MarketData(ff_data(conn, Date(2000), today()))
```
By default, these functions add a column to the market data as the intercept column (a column of ones).

Second, load the firm data. This stores the firm data in a dictionary where the identifier (typically Permno, so an integer) is the key and the related data is stored for quick access to a range of dates. In terms of total size, I find that the dictionary is typically smaller than daily return data since the dictionary only stores the identifier once. To load the firm data, run:
```julia
FirmData(
df_firm_data;
valuecols="ret"
)
```
Depending on the amount of firm data, this might take some time. For example, tested on a Ryzen 3600, loading 36 million rows took about a minute. This is by far the slowest part of this, limiting the data will make the operations faster.

### Accessing Cached Data

This package provides three functions for accessing the cached data. While these are available, most of the functions discussed later automatically use these, however, if you want to build your own functions these provide the basis for accessing the data quickly. Data for firms is stored in vectors, data for the market is stored in a matrix.

- `get_firm_data(id, date_start, date_end, col)` fetches a specific firms data (based in `id`) between two dates for one of the columns.
- `get_market_data(date_start, date_end, cols_market...)` fetches data for the market. It fetches all columns (if no values for cols_market is passed) or a selection of columns based on cols_market.
- `get_market_data(id, date_start, date_end, cols_market...)` also fetches data for the market, except will also make sure that the number of rows in the fetched matrix is the same as the dates that exist for the firm id provided. Since firms can be missing data relative to the market, this can make sure you are comparing equivalent data.
- `get_firm_market_data(id, date_start, date_end; cols_market::Union{Nothing, Vector{String}, String}=nothing, col_firm::String="ret")` is a combination of the previous functions. It returns a tuple, with the first element being a vector of the requested firm data and the second being a matrix of firm data, similar to the previous function, the number of rows in the market matrix is the same as the length of the firm vector.

### Calculation Functions

#### Regression Estimate

It is often necessary to estimate a regression model for a specific firm based on market data. The method is similar to the `get_firm_market_data` function previously described:
```julia
cache_reg(
id::Int,
est_min::Date,
est_max::Date;
cols_market::Union{Nothing, Vector{String}}=nothing,
col_firm::String="ret",
minobs::Real=.8,
calendar="USNYSE"
)
```
This fetches the data and runs the regression. If `cols_market` is `nothing`, then all columns are used. Be careful if leaving this as `nothing` and using Fama-French data since that data includes the risk free rate of return, which is often constant over short periods and would be colinear with an intercept.

If `minobs` is less than 1, then the function assumes that this is a ratio of the number of available data for the firm over the period relative to the market data required for the regression. If there is not enough data, this function returns `missing`.

This function returns a `BasicReg` model, which is a subtype of the `RegressionModel` from StatsBase.jl. Most of the later functions will work if you use a different package to calculate the regression, as long as that package provides the necessary items under the StatsBase.jl API.

The `BasicReg` model is inentionally minimalistic, making it easy to save and is useful when running a large number of regressions.

#### Variance and Standard Deviation

There are two common methods of calculating variance over a period for a firm. The first is to subtract the market return from the firm return and take the variance, given a period.
```julia
var[std](id, date_start, date_end; cols_market="vwretd", col_firm="ret")
```

The second method is to calculate the variance after estimating a regression. This uses the error in the regression, `var[std](rr)` where `rr` is a `RegressionModel`.

#### Buy and Hold Returns

Buy and hold returns are also known as geometric returns. The API for these functions is similar to `get_firm_data` and `get_market_data`:

```julia
bh_return([id], date_start, date_end, col)
```

If and `id` provided, then this is used for the market return.

#### Abnormal Returns

There are two types of common abnormal returns, buy and hold (bhar) and cumulative (car). These all work by subtracting some expected return from the firm's actual return. The expected return is either the market average or estimated based on a firm-specific regression model.

To calculate abnormal returns relative to a market index (typically `"vwretd"`, `"ewretd"`, or `"mkt"`), run:
```julia
bhar[car](id, date_start, date_end; cols_market="vwretd", col_firm="ret")
```

To calculate abnormal returns relative to a regression, run:
```julia
bhar[car](id, date_start, date_end, rr)
```

### Example

Assume you have a DataFrame of firm-events:
```julia
df = DataFrame(
permno = [61516, 76185, 87445, 14763, 15291, 51369, 82515],
event_date = [Date(2020, 6, 22), Date(2020, 6, 22), Date(2020, 6, 22), Date(2020, 9, 21), Date(2020, 9, 21), Date(2020, 9, 21), Date(2020, 10, 7)]
)

# Row │ permno event_date
# │ Int64 Date
# ─────┼────────────────────
# 1 │ 61516 2020-06-22
# 2 │ 76185 2020-06-22
# 3 │ 87445 2020-06-22
# 4 │ 14763 2020-09-21
# 5 │ 15291 2020-09-21
# 6 │ 51369 2020-09-21
# 7 │ 82515 2020-10-07

# initialize cached data, adding a column for total market return
df_market = ff_data(conn, Date(2018), today())
df_market[!, :mkt] = df_market.mktrf .+ df_market.rf
MarketData(df_market)
FirmData(crsp_data(conn, df.permno, Date(2018), today(); cols=["ret"]))

# run the Fama French 3 factor model over an estimation period
df[!, :reg] = cache_reg.(df.permno, df.event_date - Day(300), df.event_date - Day(50); cols_market=["intercept", "mktrf", "smb", "hml"])

# calculate the standard deviation during the estimation period
df[!, :std] = std.(df.reg)

# calculate the buy and hold abnormal returns over the event period
df[!, :bhar] = bhar.(df.permno, df.event_date - Day(3), df.event_date + Day(3), df.reg)

# compare bhar for Fama French vs bhar relative to the market
df[!, :bhar_market] = bhar.(df.permno, df.event_date - Day(3), df.event_date + Day(3); cols_market="mkt")

# remove reg column to clean up and make dataframe sortable later
select!(df, Not(:reg))
# Row │ permno event_date std bhar bhar_market
# │ Int64 Date Float64 Float64 Float64
# ─────┼─────────────────────────────────────────────────────────
# 1 │ 61516 2020-06-22 0.0179959 -0.0463641 -0.0287815
# 2 │ 76185 2020-06-22 0.013956 -0.0252314 0.00200653
# 3 │ 87445 2020-06-22 0.0205844 -0.0531 -0.0503591
# 4 │ 14763 2020-09-21 0.020595 -0.00303077 -0.0091872
# 5 │ 15291 2020-09-21 0.0360563 0.0321485 0.0723931
# 6 │ 51369 2020-09-21 0.02035 0.0175293 0.0143943
# 7 │ 82515 2020-10-07 0.020721 0.017211 0.0310354

```
This functionality is now part of the package [AbnormalReturns.jl](https://github.com/junder873/AbnormalReturns.jl)

## Disclaimer

Expand Down
16 changes: 0 additions & 16 deletions src/WRDSMerger.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,23 +33,10 @@ export link_identifiers, Permno, Cusip, NCusip,
export comp_data, crsp_data, crsp_market, crsp_stocknames,
crsp_adjust, crsp_delist, list_libraries, list_tables,
describe_table, get_table, raw_sql, ff_data

# types and functions for fast CAR calculations
export TimelineData, FirmData, car, alpha, beta,
MarketData, get_firm_data, get_market_data,
get_firm_market_data, BasicReg, cache_reg,
bh_return, bhar, clear_firm_cached_data!,
firm_in_cache

# extra utilities
export range_join, BDay, Conditions

# From Statistics
export var, std

# From StatsBase
export coef, coefnames, responsename, nobs, dof_residual,
r2, adjr2, islinear, deviance, rss, predict

##############################################################################
##
Expand All @@ -62,11 +49,8 @@ include(joinpath("utils", "dateFunctions.jl"))
include(joinpath("utils", "identifierTypes.jl"))
include(joinpath("utils", "linkTree.jl"))
include(joinpath("utils", "utils.jl"))
include(joinpath("utils", "timelineDataCache.jl"))
include(joinpath("utils", "fastRegression.jl"))

include("crspFunctions.jl")
include("calcFunctions.jl")
include("compFunctions.jl")
include("mergeFunctions.jl")
include("exploreDB.jl")
Expand Down
150 changes: 0 additions & 150 deletions src/calcFunctions.jl

This file was deleted.

Loading

2 comments on commit 4087aea

@junder873
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/57567

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.4.0 -m "<description of version>" 4087aeab1946dd77801fdcf96f6c5fe75828ac67
git push origin v0.4.0

Please sign in to comment.