There are some places that write these "This financial ratio is a great predictor of stock growth" type of articles. This always seemed like clickbait to me. So i wanted to test if there is some actual validity to these claims.
So how can we test that claim? Scrape financial statement data, combine it and run it through correlation calculations.
I decided to scrape the companies that were included in the S&P500 index. Why? Because there is a high enough probability that, if you're an investor, you would be interested in a company that is listed there. Thus, figuring out if there's any correlation between stock increase and a financial ratio for this group would be beneficial for some investors.
For data gathering, i used the Financial Modeling Prep API. During research for an API to use, it seemed like one of the best options (not sponsored). In total 8 endpoints were scraped for each stock ticker included in the index. These include:
What's cool about Financial Modeling Prep was that they provided endpoints that hold most of the financial ratios already calculated. This means that we don't need to worry about doing that ourselves.
The free version of the API provides data for the last 4 years. This means that if we scrape 500 companies x 4 years, we would have 2000 years as a reference for calculations.
The API accepts the ticker as an url param, so all we need to do to gather the data is:
Once that is done, we have 8 seperate csv files that hold the scraped data. How do we combine it? We can create a unique ID for the row that then could be joined together. What can we use to create that unique ID? All of the scraped csv files always have 2 columns -> ticker name + date. Combine these two columns together and voila, we have a unique ID.
Luckily for me, the enterprise-value endpoint also had info about the stock price on the date when the financial reports were published. This means that we can use it to calculate the annual stock increase percentage easily.
After combining all of the scraped data together we have a csv file of 2484 rows (years) and 327 columns (financial variables). Note that there are columns with repeated information, so in reality we have a lot less columns that will be used in correlation calculations. For example, the Pearsons correlation results have 275 rows of financial variables and a lot of the rows hold duplicate financial variables.
Statistics for the calculated annual stock incrase percentage
| enterprise_value.annual_stock_increase | |
|---|---|
| count | 1983 |
| mean | 0.137970869 |
| std | 0.287600395 |
| min | -0.724216618 |
| 25% | -0.051685853 |
| 50% | 0.116157779 |
| 75% | 0.291180917 |
| max | 2.098626046 |
Pearsons correlation
| enterprise_value.annual_stock_increase | |
|---|---|
| enterprise_value.annual_stock_increase | 1 |
| key_metrics.stockBasedCompensationToRevenue | 0.189647428 |
| financial_growth.revenueGrowth | 0.157458114 |
| enterprise_value.stockPrice | 0.155575331 |
| income_statement_growht.growthRevenue | 0.146234174 |
| enterprise_value.annual_stock_increase | |
|---|---|
| enterprise_value.annual_stock_increase | 1 |
| key_metrics.enterpriseValueOverEBITDA | 0.360546333 |
| ratio.enterpriseValueMultiple | 0.360402587 |
| key_metrics.pocfratio | 0.351598389 |
| ratio.priceCashFlowRatio | 0.351425033 |
Ordinary Least Sqaures regression
| dependent_variable | coef | std err | t | P>|t| | [0.025 | 0.975] | R_SQ | F_STAT | N |
|---|---|---|---|---|---|---|---|---|---|
| ratio.priceToOperatingCashFlowsRatio | 10.7486 | 0.722 | 14.892 | 0 | 9.333 | 12.164 | 0.123 | 221.8 | 1589 |
| ratio.priceCashFlowRatio | 10.7486 | 0.722 | 14.892 | 0 | 9.333 | 12.164 | 0.123 | 221.8 | 1589 |
| key_metrics.pocfratio | 10.7517 | 0.722 | 14.892 | 0 | 9.336 | 12.168 | 0.123 | 221.8 | 1588 |
| ratio.dividendYield | -0.0191 | 0.001 | -12.774 | 0 | -0.022 | -0.016 | 0.108 | 163.2 | 1351 |
| key_metrics.dividendYield | -0.0191 | 0.001 | -12.766 | 0 | -0.022 | -0.016 | 0.108 | 163 | 1350 |
Plots for financial variables with the highes OLS R Squared values