Join the Quant Scientist Newsletter

Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...

Join 11,500+ Quant Scientists learning one article at a time

Using kmeans for portfolio construction

Algorithmic Trading

October 19, 2024•4 min read

When Matt and I started working together, he was really concerned about correlation.

He got burned during the financial crisis and was worried about all his assets going down at the same time.

I showed him a great way to diversify his holdings with machine learning.

It's called kmeans and it's simple to do with Python.

In today's issue of the QS Newsletter (get the code), we are going to use a simple but powerful technique called clustering to get an idea how concentrated our portfolio is.

To do it we'll use the sklearn package.

What You’ll Learn:

Download historical stock price data and compute the mean and variance
Use Scikit-learn to preprocess and analyze the clusters
Plot the clusters to visualize where there is concentration

BONUS: Get the Python Code for EVERYTHING you see in this post

Disclaimer:

The information and educational material provided by Quant Science, LLC are for educational purposes only and should not be considered as financial advice or recommendations to purchase, hold, or sell any securities or other financial instruments. Before you proceed, please review our full disclaimer here.

Join the Quant Scientist Newsletter (and Get the Code)

Want exclusive access to our FULL codebase for this Quant Science tutorial plus dozens more?

Join thousands of aspiring Python quants here 👉

NEW: Free 5-Day Algorithmic Trading Course

Since you're here, you probably want to learn how to get started developing (profitable) algorithmic trading strategies and reinvest those profits.

Here are the steps:

Find edge
Analyze risk
Backtest trading strategies
Execute trades automatically

Easy right? Well, not exactly... Avoid the 5 biggest mistakes beginners make with our free, 5-day email course:

Click here to join our free 5-Day Algorithmic Trading Course 👉

Now on to the show...

Using kmeans for portfolio construction

KMeans clustering is an unsupervised machine learning algorithm that works by creating clusters from a dataset and assigning each data point to its closest cluster.

It was developed in the 1950s by Stuart Lloyd and later refined by J. MacQueen in 1967.

KMeans clustering can be used to identify stocks that are similar in terms of their performance and risk profile.

By clustering stocks, investors can create more diversified portfolios, remove correlated assets, and identify candidates for pairs trading strategies. KMeans clustering can also be used to identify stocks that are undervalued or overvalued relative to their peers.

Let's see how it works.

First, make sure to sign up for our Newsletter to get all of the code you see today.

Imports and set up

First, start with the imports. You need pandas for manipulating data, scikit-learn to fit the KMeans model, Matplotlib for plotting, and yfinance to get market data.

Now, use pandas to read an HTML table from Wikipedia. The table has a list of the Dow Jones stocks which we’ll use for the analysis.

Data preprocessing

We will use KMeans to cluster stocks together based on their returns and volatility.

This is a compact pandas statement that uses chaining. First, compute the percent change to get the daily returns. Then use the pandas describe method to get a DataFrame of summary statistics.

You end up with a list of Dow Jones stocks, their annualized mean and standard deviation.

Do KMeans clustering

The first step is to measure inertia. Inertia measures how well a dataset was clustered by KMeans. It’s calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster.

The result is a smooth, downward sloping chart. You can estimate where adding another cluster doesn’t significantly reduce the inertia. It looks like it’s around five or six.

Next, build and plot the clusters.

First, fit the model to the data using five clusters. Then plot the points and annotate each one with the ticker symbol and its cluster.

It’s clear to see how stocks are grouped together. You can use this analysis to diversify stock portfolios by reducing exposure to stocks in similar clusters. KMeans is also a great way to select potential pairs trading candidates by identifying which stocks are economically linked.

Congratulations!

You just learned how to use machine learning to build an optimal stock portfolio.

But, there's more to learn in algorithmic trading:

Backtesting your portfolio construction algorithm to make sure the strategy will work in the future
Executing the trades automatically
Monthly rebalancing
Tracking your actual Profit and Loss
Incorporating Trading Fees

Are you interested in learning algorithmic trading strategies that maximize returns responsibly, help you manage risk, and grow your investments?

We implement 3 core trading strategies including portfolio, momentum, and spread trades that have worked in our favor in the past and continue to produce results for our students.

Join 400+ of us that are learning to apply python to algorithmic trading to grow investments.

Leo was up 11.5% in just 13 trading days.

Alex was waiting 9 years for a course like this:

Ready to make Algorithmic Trading Strategies that actually work?

There's nothing worse than going at this alone--

❌ Learning Python is tough.

❌ Learning Trading is tough.

❌ Learning Math & Stats is tough.

It's no wonder why it's easy to feel lost, make bad decisions, and lose money.

Want help?

👉 Join 10,700+ future Quant Scientists on our Python for Algorithmic Trading Course Waitlist: https://learn.quantscience.io/python-algorithmic-trading-course-waitlist

investingstockspythonalgorithmic tradingsoftwareffn

Matt Dancho

Matt is a Data Science expert with over 18 years working in business and 10+ years as a Data Scientist, Consultant, and Trainer. Matt has built Business Science, a successful educational platform with similar goals to Quant Science, but focused on developing Data Scientists in business, marketing, and finance disciplines.

Back to Blog

Start Your Journey To Becoming A Quant Today!

JOIN OUR COURSE WAITLISTYes, I want to learn algorithmic trading!

Join the Quant Scientist Newsletter

Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...

Join 11,500+ Quant Scientists learning one article at a time

Terms & Conditions

About

Contact

Next Cohort Launch: Wednesday, January 15th at 10AM EST

Join the live launch event