Join the Quant Scientist Newsletter

Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...

Join 11,500+ Quant Scientists learning one article at a time

Join 11,500+ Quant Scientists learning one article at a time

Using kmeans for portfolio construction

Using kmeans for portfolio construction

October 19, 20244 min read

When Matt and I started working together, he was really concerned about correlation.

He got burned during the financial crisis and was worried about all his assets going down at the same time.

I showed him a great way to diversify his holdings with machine learning.

It's called kmeans and it's simple to do with Python.

In today's issue of the QS Newsletter (get the code), we are going to use a simple but powerful technique called clustering to get an idea how concentrated our portfolio is.

To do it we'll use the sklearn package.

What You’ll Learn:

  1. Download historical stock price data and compute the mean and variance

  2. Use Scikit-learn to preprocess and analyze the clusters

  3. Plot the clusters to visualize where there is concentration

BONUS: Get the Python Code for EVERYTHING you see in this post

Disclaimer:

The information and educational material provided by Quant Science, LLC are for educational purposes only and should not be considered as financial advice or recommendations to purchase, hold, or sell any securities or other financial instruments. Before you proceed, please review our full disclaimer here.

Using kmeans for portfolio construction

Join the Quant Scientist Newsletter (and Get the Code)

Want exclusive access to our FULL codebase for this Quant Science tutorial plus dozens more?

Join thousands of aspiring Python quants here 👉

NEW: Free 5-Day Algorithmic Trading Course

5 Day Algorithmic Trading Course

Since you're here, you probably want to learn how to get started developing (profitable) algorithmic trading strategies and reinvest those profits.

Here are the steps:

  1. Find edge

  2. Analyze risk

  3. Backtest trading strategies

  4. Execute trades automatically

Easy right? Well, not exactly... Avoid the 5 biggest mistakes beginners make with our free, 5-day email course:

Click here to join our free 5-Day Algorithmic Trading Course 👉

Now on to the show...

Using kmeans for portfolio construction

KMeans clustering is an unsupervised machine learning algorithm that works by creating clusters from a dataset and assigning each data point to its closest cluster.

It was developed in the 1950s by Stuart Lloyd and later refined by J. MacQueen in 1967.

KMeans clustering can be used to identify stocks that are similar in terms of their performance and risk profile.

By clustering stocks, investors can create more diversified portfolios, remove correlated assets, and identify candidates for pairs trading strategies. KMeans clustering can also be used to identify stocks that are undervalued or overvalued relative to their peers.

Let's see how it works.

First, make sure to sign up for our Newsletter to get all of the code you see today.

Imports and set up

First, start with the imports. You need pandas for manipulating data, scikit-learn to fit the KMeans model, Matplotlib for plotting, and yfinance to get market data.

OImports

Now, use pandas to read an HTML table from Wikipedia. The table has a list of the Dow Jones stocks which we’ll use for the analysis.

Download

Data preprocessing

We will use KMeans to cluster stocks together based on their returns and volatility.

kmeans

This is a compact pandas statement that uses chaining. First, compute the percent change to get the daily returns. Then use the pandas describe method to get a DataFrame of summary statistics.

You end up with a list of Dow Jones stocks, their annualized mean and standard deviation.

Do KMeans clustering

The first step is to measure inertia. Inertia measures how well a dataset was clustered by KMeans. It’s calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster.

Kmeans

The result is a smooth, downward sloping chart. You can estimate where adding another cluster doesn’t significantly reduce the inertia. It looks like it’s around five or six.

Elbow

Next, build and plot the clusters.

Kmeans

First, fit the model to the data using five clusters. Then plot the points and annotate each one with the ticker symbol and its cluster.

Kmeans

It’s clear to see how stocks are grouped together. You can use this analysis to diversify stock portfolios by reducing exposure to stocks in similar clusters. KMeans is also a great way to select potential pairs trading candidates by identifying which stocks are economically linked.

Congratulations!

You just learned how to use machine learning to build an optimal stock portfolio.

But, there's more to learn in algorithmic trading:

  • Backtesting your portfolio construction algorithm to make sure the strategy will work in the future

  • Executing the trades automatically

  • Monthly rebalancing

  • Tracking your actual Profit and Loss

  • Incorporating Trading Fees

Are you interested in learning algorithmic trading strategies that maximize returns responsibly, help you manage risk, and grow your investments?

We implement 3 core trading strategies including portfolio, momentum, and spread trades that have worked in our favor in the past and continue to produce results for our students.

Join 400+ of us that are learning to apply python to algorithmic trading to grow investments.

Leo was up 11.5% in just 13 trading days.

Leo up 13pct

Alex was waiting 9 years for a course like this:

testimonial

Ready to make Algorithmic Trading Strategies that actually work?

There's nothing worse than going at this alone--

Learning Python is tough.

❌ Learning Trading is tough.

Learning Math & Stats is tough.

It's no wonder why it's easy to feel lost, make bad decisions, and lose money.

Want help?

Python for Algorithmic Trading Course

👉 Join 10,700+ future Quant Scientists on our Python for Algorithmic Trading Course Waitlist: https://learn.quantscience.io/python-algorithmic-trading-course-waitlist

button course waitlist

investingstockspythonalgorithmic tradingsoftwareffn
Matt is a Data Science expert with over 18 years working in business and 10+ years as a Data Scientist, Consultant, and Trainer. Matt has built Business Science, a successful educational platform with similar goals to Quant Science, but focused on developing Data Scientists in business, marketing, and finance disciplines.

Matt Dancho

Matt is a Data Science expert with over 18 years working in business and 10+ years as a Data Scientist, Consultant, and Trainer. Matt has built Business Science, a successful educational platform with similar goals to Quant Science, but focused on developing Data Scientists in business, marketing, and finance disciplines.

Back to Blog

Start Your Journey To Becoming A Quant Today!

Join the Quant Scientist Newsletter

Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...

Join 11,500+ Quant Scientists learning one article at a time

Join 11,500+ Quant Scientists learning one article at a time

© 2024 Quant Science - All Rights Reserved

Next Cohort Launch: Wednesday, January 15th at 10AM EST