Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...
Join 11,500+ Quant Scientists learning one article at a time
Join 11,500+ Quant Scientists learning one article at a time
When Matt and I started working together, he was really concerned about correlation.
He got burned during the financial crisis and was worried about all his assets going down at the same time.
I showed him a great way to diversify his holdings with machine learning.
It's called kmeans and it's simple to do with Python.
In today's issue of the QS Newsletter (get the code), we are going to use a simple but powerful technique called clustering to get an idea how concentrated our portfolio is.
To do it we'll use the sklearn
package.
What You’ll Learn:
Download historical stock price data and compute the mean and variance
Use Scikit-learn to preprocess and analyze the clusters
Plot the clusters to visualize where there is concentration
BONUS: Get the Python Code for EVERYTHING you see in this post
Disclaimer:
The information and educational material provided by Quant Science, LLC are for educational purposes only and should not be considered as financial advice or recommendations to purchase, hold, or sell any securities or other financial instruments. Before you proceed, please review our full disclaimer here.
Want exclusive access to our FULL codebase for this Quant Science tutorial plus dozens more?
Join thousands of aspiring Python quants here 👉
Since you're here, you probably want to learn how to get started developing (profitable) algorithmic trading strategies and reinvest those profits.
Here are the steps:
Find edge
Analyze risk
Backtest trading strategies
Execute trades automatically
Easy right? Well, not exactly... Avoid the 5 biggest mistakes beginners make with our free, 5-day email course:
Click here to join our free 5-Day Algorithmic Trading Course 👉
Now on to the show...
KMeans clustering is an unsupervised machine learning algorithm that works by creating clusters from a dataset and assigning each data point to its closest cluster.
It was developed in the 1950s by Stuart Lloyd and later refined by J. MacQueen in 1967.
KMeans clustering can be used to identify stocks that are similar in terms of their performance and risk profile.
By clustering stocks, investors can create more diversified portfolios, remove correlated assets, and identify candidates for pairs trading strategies. KMeans clustering can also be used to identify stocks that are undervalued or overvalued relative to their peers.
Let's see how it works.
First, make sure to sign up for our Newsletter to get all of the code you see today.
First, start with the imports. You need pandas for manipulating data, scikit-learn to fit the KMeans model, Matplotlib for plotting, and yfinance to get market data.
Now, use pandas to read an HTML table from Wikipedia. The table has a list of the Dow Jones stocks which we’ll use for the analysis.
We will use KMeans to cluster stocks together based on their returns and volatility.
This is a compact pandas statement that uses chaining. First, compute the percent change to get the daily returns. Then use the pandas describe method to get a DataFrame of summary statistics.
You end up with a list of Dow Jones stocks, their annualized mean and standard deviation.
The first step is to measure inertia. Inertia measures how well a dataset was clustered by KMeans. It’s calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster.
The result is a smooth, downward sloping chart. You can estimate where adding another cluster doesn’t significantly reduce the inertia. It looks like it’s around five or six.
Next, build and plot the clusters.
First, fit the model to the data using five clusters. Then plot the points and annotate each one with the ticker symbol and its cluster.
It’s clear to see how stocks are grouped together. You can use this analysis to diversify stock portfolios by reducing exposure to stocks in similar clusters. KMeans is also a great way to select potential pairs trading candidates by identifying which stocks are economically linked.
You just learned how to use machine learning to build an optimal stock portfolio.
But, there's more to learn in algorithmic trading:
Backtesting your portfolio construction algorithm to make sure the strategy will work in the future
Executing the trades automatically
Monthly rebalancing
Tracking your actual Profit and Loss
Incorporating Trading Fees
Are you interested in learning algorithmic trading strategies that maximize returns responsibly, help you manage risk, and grow your investments?
We implement 3 core trading strategies including portfolio, momentum, and spread trades that have worked in our favor in the past and continue to produce results for our students.
Join 400+ of us that are learning to apply python to algorithmic trading to grow investments.
Leo was up 11.5% in just 13 trading days.
Alex was waiting 9 years for a course like this:
There's nothing worse than going at this alone--
❌ Learning Python is tough.
❌ Learning Trading is tough.
❌ Learning Math & Stats is tough.
It's no wonder why it's easy to feel lost, make bad decisions, and lose money.
Want help?
👉 Join 10,700+ future Quant Scientists on our Python for Algorithmic Trading Course Waitlist: https://learn.quantscience.io/python-algorithmic-trading-course-waitlist
Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...
Join 11,500+ Quant Scientists learning one article at a time
Join 11,500+ Quant Scientists learning one article at a time