Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Danny Groves

@DrDanobi

Dec 10, 2023 • 29 tweets • 12 min read • Read on X

Scrolly

Machine learning for finance can be tricky!

So let's explore with another step-by-step case study 🧵

This time, let's use ML to find time series patterns across multiple instruments.

The aim - to find like patterns in the market and build a scanner.

Enjoy!

OK, what ML tools are we using here?

This will be a clustering approach = unsupervised.

Unsupervised means we give the algorithm no guidance on what to look for.

I'm interested to see what patterns IT can find for me.

Yes, I'm lazy.

Algos like K means clustering don't really have a concept of a time-series

They're great at other things, of course, but I want to find time series patterns.

So let's consider a special kind of clustering.

Time-series clustering, using tslearn.

1. Download and Feature Engineer

For now let's keep it simple:
• 3 tickers - just for exploring the idea
• yfinance - free daily data
• 3 features - Close prices, range (%), open to close (%)

You can experiment with other features & instruments later

2. Feature Reshaping

Time series algos require data in the shape = (example #, feature length, feature #)

So first, let's create new cols in our df, one per feature & position in the time series

Since here I want a 50 length time series over 3 features, I'll have 150 new cols

2a. Feature Reshaping

Also, I assume that adjacent examples split apart by one day will be similar - therefore, let's undersample by selecting every 5th row.

This may save us some training time and still retain most of the variance in the examples.

2c. Feature Reshaping

With all these new cols, we now need to reshape into a 3D array

This is achieved with the attached code

I will say, this whole reshaping nonsense is tricky to wrap your head around! So don't worry if it's not immediately obvious 🙂

3. Scaling

Two of our features are % changes -> already scaled.

However, closing prices are not scaled!

Why do we need to scale?

We're aiming to say "this time series is similar to that one" - it could be tricky to judge this if they're completely different on price scales.

3a. Scaling

Once prices are on the same scale, seeing if one curve looks like another is easier

Since our goal is to find time-series similarity, scaling is a good idea

How do we scale?

Let's use standard scaling - the time series have a mean of 0 and standard deviation of 1

3b. Scaling

Why this approach?

It tends to be a little less sensitive to outliers.

But at the end of the day, it's worth trying other ways too - because they could be better for your particular problem.

4. Clustering

Clustering is basically a way to sort data into groups.

However, we decide how many groups we want before we start

Therefore, the big question is - how many groups do we need?

4a. Clustering

Enter the elbow method

Basically, we fit by increasing K by 1 each time

Each fit gives a value which measures how compact the groupings are.

Lower the value = better fit.

Or does it?

4b. Clustering

If we push it too far, every point is it's own cluster, which isn't very useful.

On the other hand, having too many clusters might lead to overfitting, and we want our model to be more general.

The kneed package sorts this out for us, choosing the K at the elbow

4c. Clustering

The whole fitting process took me 153 seconds - not too bad!

The optimal k was 5.

We can also see that cluster 1 is the most popular in both train and test.

However, right now, this tells us nothing - let's view some charts!

5. Analyse the Clusters

Let's pick 4 random charts from cluster 0 - the highlight is the clustered point.

Clearly, it's picking some similarity - looks to me as if they're all in strong downtrends.

However, are they strikingly similar?

IMO - not really. Let's improve it.

6. Improvement

My hypothesis is that there are simply more than 5 ways the time series can form in the market.

So perhaps the elbow method isn't too helpful here.

Instead - let's up the granularity and find many more clusters.

6a. Improvement

I chose 50 - you may ask why. I have no intelligent response.

The main reason was to just be more granular, and see if it helps!

After all, it's all exploration for now.

So, did it help?

Let's explore with some charts.

6b. Improvement

Since 10 was the most popular cluster, I decided to take 4 random charts from that.

The results are much better now.

There's definitely more similarity - all in uptrends with a pause in momentum.

Can we do better again?

7. Dynamic Time Warping

Enter dynamic time warping (DTW).

DTW is a way of comparing two time series a little more robustly than other distance measures.

Let's break it down and explain why this is helpful.

7a. DTW

In pointwise matching, we would aim to get a distance measure by summing all the distances between each point.

However, if we shift the time series, the diffs on the y-axis may get large.

It's the same series, but the cost would be high.

7b. DTW

In DTW the points are matched up differently.

It's done in a way so that their y distance is smaller - it compares the two, equal time series efficiently even though they are shifted.

I won't go full into the details here, because that's easily another 🧵

7c. DTW

By the way, this also works on time series of different lengths, which is a nice perk (as not all time-series patterns are created equal).

The outcome of the method is a single number, the warping cost.

Smaller cost = more similar the time series.

7d. DTW

Why am I explaining all this?

tslearn allows you to use dtw in the metric keyword argument.

With this, we should expect a more robust time series comparison!

8. Cluster Analysis - DTW

Cluster 30 is the most popular here, let's check out some charts.

These look even more similar to me - strong uptrend followed by a momentum pause.

IMO - it's super neat that an algo can find this, with absolutely no direction!

8a. Cluster Analysis - DTW

However - this increased performance comes at a cost.

DTW is not a particularly parallelisable algo, meaning it's really slow.

In fact this case took me 3-5 business days to run.

I joke, it was actually 1971 seconds (30 minutes...)

9. Conclusions

So what did we learn here?

tslearn can be used to group like patterns in the market, without any direction, and completely on time-series data.

Euclidean does a good job, DTW a better one - but at the cost of it being really slow.

9a. Conclusions

It's better asking for more granularity, the elbow method is not particularly helpful for us here.

However, how granular still remains a question!

A topic for further research.

9b. Conclusions

Remember, this was on 3 listed stocks, 3 simple features, and to be fair, they're all similar ones at that (i.e. large tech stocks).

It's not the most "robust" approach, but illustrates the idea and should be easy to extend.

https://twitter.com/1645791490797617155/status/1733852065556816347

Anyway, that's enough rambling for today.

The code has been committed to my git repo, the link is in my bio.

And, If you are interested in applying ML/data-science to finance then follow me @DrDanobi for more threads like these!

https://twitter.com/1645791490797617155/status/1733852065556816347

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @DrDanobi

Danny Groves

@DrDanobi

Jan 9, 2024

Finding optimal trading parameters is great - but they may not respond or adapt to market changes

This is where ML may be able to help us!

So let's take a step-by-step case study in this 🧵and use ML to try and predict the best trading parameters each day

Enjoy!

I can't take credit for the idea - that goes to @chanep and his team at @PredictNowAI

They call it "Conditional Parameter Optmisation" (CPO) - the key idea is to use ML to find the optimal parameters based on the current market conditions, i.e. it's adaptive.

@chanep @PredictNowAI So what are we going to consider in this 🧵?

• SPY only
• An intraday RSI strategy with 3 parameters (two thresholds and a lookback period)
• ML features from daily data
• Outcome = predicted best 3 parameters for tomorrow

Let's dig in!

Read 31 tweets

Danny Groves

@DrDanobi

Dec 31, 2023

Good visualisations are key to quickly assessing strategy performance

So here's a 🧵to discuss two I use frequently:

1. Monte Carlo Plots
2. Win Rate Heatmap

And yes, all Python code is included, enjoy!

First of all, let me say that my personal ethos with visualisations is to be as minimalist as possible.

I have sat through one too many presentations with complex charts - sure they're packed with info, but not easy to immediately digest.

So how do I create minimalist charts?

Easy! I start by creating complex ones - then iteratively simplify and/or take things away until it's as clean as it can be.

This is also the approach I take to programming.

Anyway, let's dig in with an example!

Read 17 tweets

Danny Groves

@DrDanobi

Dec 21, 2023

Machine learning for finance can be tricky

So let's break it down in a step-by-step case study 🧵

This time - let's create a model, and use explainable ML techniques to understand how it's making decisions.

And hopefully we'll gain some trading insights along the way!

Enjoy!

https://twitter.com/1645791490797617155/status/1729469353282744515

First of all - this 🧵will use the same application as in the quoted one.

However, this time we will:
• Use a different classifier
• Drop correlated features
• Explain the model outputd

And I'll even throw the details of the attached 🧵 here too.

https://twitter.com/1645791490797617155/status/1729469353282744515

The basic motivation is to try and understand what decisions the ML model is making at specific points (e.g. at the COVID crash)

Perhaps this understanding can be used to gain more trading intuition - or at the very least to make better models with

Let's dig in!

Read 37 tweets

Danny Groves

@DrDanobi

Dec 16, 2023

I spent 100's of hours making a ML breakout scanner.

The outcome - scanning the entire market ≤ 2 mins per day.

The thing is, I hardly use it now... but it taught me a lot.

Here's 10 things I learned from this experience 🧵

(My favourite is Tip 7!)

Firstly, what was my rationale for this idea?

Pro traders recommend checking out 1000's of examples to learn setups.

I was like "Hey, whilst I'm doing that, I will save the examples and let ML find them again for me, saving me loads of time!"

Seemed like a win-win!

Why don't I use it?

My trading/ML has evolved - the outcome meant I discretionarily picked tickers the scanner spat out.

It doesn't suit my personality to be discretionary at all - so now I consider other ideas.

Anyway, intro over, here are the 10 things!

Read 17 tweets

Danny Groves

@DrDanobi

Dec 12, 2023

Machine learning for finance is not all about prediction.

So let's take a step-by-step case study in this 🧵

This time - let's cluster trades to try and find similar ones in the future

Enjoy!

First of all let me say that I love prediction problems, it's fun stuff.

However, prediction can be tricky - you have to be careful with what you ask the model or it may just not find anything at all.

Unsupervised is different, you let it find the patterns for you!

What's the idea here? We will:

1. Take a simple trading strategy
2. Find examples over a small basket of stocks
3. Ask ML to group these past trades

Why?

I'd like to see if it can find similar looking trades in the future, so we get an expectancy on how they should behave.

Read 29 tweets

Danny Groves

@DrDanobi

Dec 5, 2023

Machine learning for finance is hard.

So let's take a step-by-step case study in this 🧵

This time - let's ask ML to find patterns without giving it any direction, and use those patterns to mitigate risk.

Oh, and we'll backtest the ML model too!

Enjoy!

To start, let's briefly introduce two fields of ML:

1. Unsupervised - grouping like cases, where the ML is given no direction.
2. Supervised - learning from labelled data (e.g. this is a cat, this is a dog)

We're going to use 1 to find a pattern, and 2 to de-risk it.

1. Data + Feature Engineering

To keep it simple, let's get daily SPY data from yfinance - with this we'll get ~20 years of daily data.

The features we'll use are also simple:
• Price change from the SMA
• Price change from the n day max/min
• Price Changes

Read 30 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Danny Groves

Try unrolling a thread yourself!

More from @DrDanobi

Danny Groves

Danny Groves

Danny Groves

Danny Groves

Danny Groves

Danny Groves

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!