Join the Quant Scientist Newsletter

Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...

Join 11,500+ Quant Scientists learning one article at a time

Autoencoders for trading

Algorithmic Trading

November 03, 2024•5 min read

Embeddings are used in neural networks to transform large, sparse data into manageable, dense formats.

What?

Well our goal is to build profitable algorithmic trading strategies.

They simplify complex data, making it easier to analyze.

Matt's working on a killer new course that demystifies how machine learning is really used in trading.

We thought we'd give you a little sneak peek.

In today's issue of the QS Newsletter (get the code), you'll learn how to train an autoencoder to build embeddings for stock factors.

(Today's newsletter is a little longer than usual, but we're making something hedge funds use simple!)

What You’ll Learn:

Build and train an autoencoder using PyTorch
Extract the embeddings to create clusters
Use PCA to reduce the dimensions and visualize the results

BONUS: Get the Python Code for EVERYTHING you see in this post

Disclaimer:

The information and educational material provided by Quant Science, LLC are for educational purposes only and should not be considered as financial advice or recommendations to purchase, hold, or sell any securities or other financial instruments. Before you proceed, please review our full disclaimer here.

Join the Quant Scientist Newsletter (and Get the Code)

Want exclusive access to our FULL codebase for this Quant Science tutorial plus dozens more?

Join thousands of aspiring Python quants here 👉

NEW: Free 5-Day Algorithmic Trading Course

Since you're here, you probably want to learn how to get started developing (profitable) algorithmic trading strategies and reinvest those profits.

Here are the steps:

Find edge
Analyze risk
Backtest trading strategies
Execute trades automatically

Easy right? Well, not exactly... Avoid the 5 biggest mistakes beginners make with our free, 5-day email course:

Click here to join our free 5-Day Algorithmic Trading Course 👉

Now on to the show...

How to use autoencoders to create feature embeddings

Embeddings are compact, dense representations of original high-dimensional stock data, transformed into a lower-dimensional space.

They are created using methods like autoencoders which retain the information contained in features, like volatility or technical indicators. These embeddings are used for clustering, anomaly detection, and predictive modeling.

Embeddings reduce stock features into lower-dimensional vectors that capture key patterns.

This makes them ideal for use in K-means analysis to group similar stocks based on their underlying characteristics.

Imports and set up

We’ll use some pretty powerful libraries in this issue including PyTorch and Scikit-Learn.

Next, we’ll download stock price data to construct our mock portfolio.

We’ll use the stock price data to create a few features.

Features are patterns in the data we think drive returns. In this example, we’re using log returns, a simple moving average, and volatility.

Build an autoencoder with PyTorch

Let’s convert the normalized feature data into PyTorch tensors and DataLoader objects.

This code converts our features data into a PyTorch tensor, wraps it in a TensorDataset for batch handling, and creates a DataLoader.

The DataLoader is used to iterate over the dataset in batches of 32 while shuffling the data to randomize the input during training.

In the encoder, data is compressed through a series of linear layers: from the original feature dimension to 64, then 32, and finally to a 10-dimensional space.

Non-linear ReLU activation functions are applied after each linear transformation to introduce non-linearity. This helps the model to capture and learn more complex data patterns effectively.

The decoder reconstructs the input data from the 10-dimensional space by gradually expanding the dimensions through linear layers from 10 to 32, then 64, and finally back to the original feature size.

The forward method of the autoencoder sequentially passes an input tensor through the encoder and decoder to produce a reconstructed version of the input.

Now we can train it.

This function manages the training of the autoencoder by iteratively adjusting its weights to minimize the loss between its predictions and the actual inputs.

The training loop iterates over the entire dataset multiple times. Each iteration processes data in batches using each batch as input and labels for autoencoder training.

Finally, we can extract the embeddings and use them to create clusters.

After extracting the embeddings, the function stacks them into a tensor, which is then clustered using K-means into five groups.

Reduce the dimensions and analyze the results

Principal Component Analysis (PCA) reduces the dimensionality of the embeddings to principal components. These components capture the directions of maximum variance in the data.

The result visualizes the two-dimensional PCA-reduced embeddings of stock data. Each point represents a stock positioned according to its values on the first two principal components. The colors represent the different clusters.

Congratulations!

You just took the first step in using machine learning in trading like the hedge funds!

But, there's more to learn in algorithmic trading:

Backtesting your portfolio construction algorithm to make sure the strategy will work in the future
Executing the trades automatically
Monthly rebalancing
Tracking your actual Profit and Loss
Incorporating Trading Fees

Are you interested in learning algorithmic trading strategies that maximize returns responsibly, help you manage risk, and grow your investments?

We implement 3 core trading strategies including portfolio, momentum, and spread trades that have worked in our favor in the past and continue to produce results for our students.

Join 400+ of us that are learning to apply python to algorithmic trading to grow investments.

Leo was up 11.5% in just 13 trading days.

Alex was waiting 9 years for a course like this:

Ready to make Algorithmic Trading Strategies that actually work?

There's nothing worse than going at this alone--

❌ Learning Python is tough.

❌ Learning Trading is tough.

❌ Learning Math & Stats is tough.

It's no wonder why it's easy to feel lost, make bad decisions, and lose money.

Want help?

👉 Join 10,700+ future Quant Scientists on our Python for Algorithmic Trading Course Waitlist: https://learn.quantscience.io/python-algorithmic-trading-course-waitlist

investingstockspythonalgorithmic tradingsoftwareffn

Matt Dancho

Matt is a Data Science expert with over 18 years working in business and 10+ years as a Data Scientist, Consultant, and Trainer. Matt has built Business Science, a successful educational platform with similar goals to Quant Science, but focused on developing Data Scientists in business, marketing, and finance disciplines.

Back to Blog

Start Your Journey To Becoming A Quant Today!

JOIN OUR COURSE WAITLISTYes, I want to learn algorithmic trading!

Join the Quant Scientist Newsletter

Gain access to exclusive tools that Wall Street's Elite don't want you to have. Don't miss the next issue...

Join 11,500+ Quant Scientists learning one article at a time

Terms & Conditions

About

Contact

Next Cohort Launch: Wednesday, January 15th at 10AM EST

Join the live launch event