Danny Groves Profile picture
Machine learning & data in trading | Pythonista | Math PhD & Data Scientist | Sharing my discoveries and mistakes | No tweets are financial advice.

Dec 10, 2023, 29 tweets

Machine learning for finance can be tricky!

So let's explore with another step-by-step case study 🧵

This time, let's use ML to find time series patterns across multiple instruments.

The aim - to find like patterns in the market and build a scanner.

Enjoy!

OK, what ML tools are we using here?

This will be a clustering approach = unsupervised.

Unsupervised means we give the algorithm no guidance on what to look for.

I'm interested to see what patterns IT can find for me.

Yes, I'm lazy.

Algos like K means clustering don't really have a concept of a time-series

They're great at other things, of course, but I want to find time series patterns.

So let's consider a special kind of clustering.

Time-series clustering, using tslearn.

1. Download and Feature Engineer

For now let's keep it simple:
• 3 tickers - just for exploring the idea
• yfinance - free daily data
• 3 features - Close prices, range (%), open to close (%)

You can experiment with other features & instruments later

2. Feature Reshaping

Time series algos require data in the shape = (example #, feature length, feature #)

So first, let's create new cols in our df, one per feature & position in the time series

Since here I want a 50 length time series over 3 features, I'll have 150 new cols

2a. Feature Reshaping

Also, I assume that adjacent examples split apart by one day will be similar - therefore, let's undersample by selecting every 5th row.

This may save us some training time and still retain most of the variance in the examples.

2c. Feature Reshaping

With all these new cols, we now need to reshape into a 3D array

This is achieved with the attached code

I will say, this whole reshaping nonsense is tricky to wrap your head around! So don't worry if it's not immediately obvious 🙂

3. Scaling

Two of our features are % changes -> already scaled.

However, closing prices are not scaled!

Why do we need to scale?

We're aiming to say "this time series is similar to that one" - it could be tricky to judge this if they're completely different on price scales.

3a. Scaling

Once prices are on the same scale, seeing if one curve looks like another is easier

Since our goal is to find time-series similarity, scaling is a good idea

How do we scale?

Let's use standard scaling - the time series have a mean of 0 and standard deviation of 1

3b. Scaling

Why this approach?

It tends to be a little less sensitive to outliers.

But at the end of the day, it's worth trying other ways too - because they could be better for your particular problem.

4. Clustering

Clustering is basically a way to sort data into groups.

However, we decide how many groups we want before we start

Therefore, the big question is - how many groups do we need?

4a. Clustering

Enter the elbow method

Basically, we fit by increasing K by 1 each time

Each fit gives a value which measures how compact the groupings are.

Lower the value = better fit.

Or does it?

4b. Clustering

If we push it too far, every point is it's own cluster, which isn't very useful.

On the other hand, having too many clusters might lead to overfitting, and we want our model to be more general.

The kneed package sorts this out for us, choosing the K at the elbow

4c. Clustering

The whole fitting process took me 153 seconds - not too bad!

The optimal k was 5.

We can also see that cluster 1 is the most popular in both train and test.

However, right now, this tells us nothing - let's view some charts!

5. Analyse the Clusters

Let's pick 4 random charts from cluster 0 - the highlight is the clustered point.

Clearly, it's picking some similarity - looks to me as if they're all in strong downtrends.

However, are they strikingly similar?

IMO - not really. Let's improve it.


6. Improvement

My hypothesis is that there are simply more than 5 ways the time series can form in the market.

So perhaps the elbow method isn't too helpful here.

Instead - let's up the granularity and find many more clusters.

6a. Improvement

I chose 50 - you may ask why. I have no intelligent response.

The main reason was to just be more granular, and see if it helps!

After all, it's all exploration for now.

So, did it help?

Let's explore with some charts.

6b. Improvement

Since 10 was the most popular cluster, I decided to take 4 random charts from that.

The results are much better now.

There's definitely more similarity - all in uptrends with a pause in momentum.

Can we do better again?


7. Dynamic Time Warping

Enter dynamic time warping (DTW).

DTW is a way of comparing two time series a little more robustly than other distance measures.

Let's break it down and explain why this is helpful.

7a. DTW

In pointwise matching, we would aim to get a distance measure by summing all the distances between each point.

However, if we shift the time series, the diffs on the y-axis may get large.

It's the same series, but the cost would be high.

7b. DTW

In DTW the points are matched up differently.

It's done in a way so that their y distance is smaller - it compares the two, equal time series efficiently even though they are shifted.

I won't go full into the details here, because that's easily another 🧵

7c. DTW

By the way, this also works on time series of different lengths, which is a nice perk (as not all time-series patterns are created equal).

The outcome of the method is a single number, the warping cost.

Smaller cost = more similar the time series.

7d. DTW

Why am I explaining all this?

tslearn allows you to use dtw in the metric keyword argument.

With this, we should expect a more robust time series comparison!

8. Cluster Analysis - DTW

Cluster 30 is the most popular here, let's check out some charts.

These look even more similar to me - strong uptrend followed by a momentum pause.

IMO - it's super neat that an algo can find this, with absolutely no direction!


8a. Cluster Analysis - DTW

However - this increased performance comes at a cost.

DTW is not a particularly parallelisable algo, meaning it's really slow.

In fact this case took me 3-5 business days to run.

I joke, it was actually 1971 seconds (30 minutes...)

9. Conclusions

So what did we learn here?

tslearn can be used to group like patterns in the market, without any direction, and completely on time-series data.

Euclidean does a good job, DTW a better one - but at the cost of it being really slow.

9a. Conclusions

It's better asking for more granularity, the elbow method is not particularly helpful for us here.

However, how granular still remains a question!

A topic for further research.

9b. Conclusions

Remember, this was on 3 listed stocks, 3 simple features, and to be fair, they're all similar ones at that (i.e. large tech stocks).

It's not the most "robust" approach, but illustrates the idea and should be easy to extend.

Anyway, that's enough rambling for today.

The code has been committed to my git repo, the link is in my bio.

And, If you are interested in applying ML/data-science to finance then follow me @DrDanobi for more threads like these!

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling