Correlation between stock performance and financial ratios for S&P500 companies

2022-02-26

// A brief summary of my bachelors thesis i wrote ~ 10 months ago

// Full results and code can be found here -> github.com/tompston/SP500_correlation

There are some places that write these "This financial ratio is a great predictor of stock growth" type of articles. This always seemed like clickbait to me. So i wanted to test if there is some actual validity to these claims.

So how can we test that claim? Scrape financial statement data, combine it and run it through correlation calculations.

Data gathereing

I decided to scrape the companies that were included in the S&P500 index. Why? Because there is a high enough probability that, if you're an investor, you would be interested in a company that is listed there. Thus, figuring out if there's any correlation between stock increase and a financial ratio for this group would be beneficial for some investors.

For data gathering, i used the Financial Modeling Prep API. During research for an API to use, it seemed like one of the best options (not sponsored). In total 8 endpoints were scraped for each stock ticker included in the index. These include:

income-statement
balance-sheet-statement
cash-flow-statement
ratios
financial-growth
income-statement-growth
key-metrics
enterprise-values

What's cool about Financial Modeling Prep was that they provided endpoints that hold most of the financial ratios already calculated. This means that we don't need to worry about doing that ourselves.

The free version of the API provides data for the last 4 years. This means that if we scrape 500 companies x 4 years, we would have 2000 years as a reference for calculations.

The API accepts the ticker as an url param, so all we need to do to gather the data is:

Get the list of tickers inluded in the index ( scraped from Wikipedia )
Convert that scraped list into an array
Loop over the array, passing the ticker as an url param and save all of the fetced data as a single csv file for each endpoint

Once that is done, we have 8 seperate csv files that hold the scraped data. How do we combine it? We can create a unique ID for the row that then could be joined together. What can we use to create that unique ID? All of the scraped csv files always have 2 columns -> ticker name + date. Combine these two columns together and voila, we have a unique ID.

Stock increase percentage

Luckily for me, the enterprise-value endpoint also had info about the stock price on the date when the financial reports were published. This means that we can use it to calculate the annual stock increase percentage easily.

Results

After combining all of the scraped data together we have a csv file of 2484 rows (years) and 327 columns (financial variables). Note that there are columns with repeated information, so in reality we have a lot less columns that will be used in correlation calculations. For example, the Pearsons correlation results have 275 rows of financial variables and a lot of the rows hold duplicate financial variables.

Correlation results

Statistics for the calculated annual stock incrase percentage

enterprise_value.annual_stock_increase

count 1983

mean 0.137970869

std 0.287600395

min -0.724216618

25% -0.051685853

50% 0.116157779

75% 0.291180917

max 2.098626046

	enterprise_value.annual_stock_increase
count	1983
mean	0.137970869
std	0.287600395
min	-0.724216618
25%	-0.051685853
50%	0.116157779
75%	0.291180917
max	2.098626046

Pearsons correlation

	enterprise_value.annual_stock_increase
enterprise_value.annual_stock_increase	1
key_metrics.stockBasedCompensationToRevenue	0.189647428
financial_growth.revenueGrowth	0.157458114
enterprise_value.stockPrice	0.155575331
income_statement_growht.growthRevenue	0.146234174

Spearmans correlation

	enterprise_value.annual_stock_increase
enterprise_value.annual_stock_increase	1
key_metrics.enterpriseValueOverEBITDA	0.360546333
ratio.enterpriseValueMultiple	0.360402587
key_metrics.pocfratio	0.351598389
ratio.priceCashFlowRatio	0.351425033

Ordinary Least Sqaures regression

dependent_variable	coef	std err	t	[0.025	0.975]	R_SQ	F_STAT	N
ratio.priceToOperatingCashFlowsRatio	10.7486	0.722	14.892	9.333	12.164	0.123	221.8	1589
ratio.priceCashFlowRatio	10.7486	0.722	14.892	9.333	12.164	0.123	221.8	1589
key_metrics.pocfratio	10.7517	0.722	14.892	9.336	12.168	0.123	221.8	1588
ratio.dividendYield	-0.0191	0.001	-12.774	-0.022	-0.016	0.108	163.2	1351
key_metrics.dividendYield	-0.0191	0.001	-12.766	-0.022	-0.016	0.108	163	1350

Plots for financial variables with the highes OLS R Squared values