Mark Tenenholtz Profile picture
Jun 12 15 tweets 4 min read Twitter logo Read on Twitter
Time-series forecasting is becoming a lost art.

It’s the engine behind the $5,000,000,000,000 retail industry in the US.

And yet, research progress is slow and the best tricks are locked away.

But why is it particularly tough even in the current AI breakthrough?

Here’s why 🧵
Forecasting is a seemingly straightforward task.

In fact, given all the success we've had with LLMs, it's weird that we can't use the exact same tricks with time-series data.

So why don't 175B parameter models work, and why can't we do the classic LM + transfer learning recipe?
Well, for starters, we kinda do the same thing.

Given a sequence of historical data points, we train the model to predict the next one (or maybe the next 10, or maybe the single value 7 observations from now).

So is it a data problem?
Well, probably not from a dataset size perspective.

While there's no equivalent of "the entire internet's worth of text" for time-series data, there are still enormous datasets.

Any large retailer will have datasets with potentially hundreds of billions of data points.
Despite that, there haven't been any convincing papers on transfer learning.

It certainly could be that not enough of these datasets are publicly available.

But, from my experience, even the largest datasets don't necessitate anything beyond a pretty shallow transformer.
On top of that, XGBoost is crazy effective in a way that it just isn't for NLP problems.

(I have some great material on this. Link at the end.)

My best guess is that it has to do with the complexity of the underlying representation.

In simple english:
Words *really* difficult to represent.

Depending on the context, they can mean entirely different things. Wouldn't it be nice if you really could represent things in a static way like in this image?

You just can't. But, in time-series problems, you kinda can. Image
In fact, methods like STUMPY exist for this.

Unlike the deep, 768/1536-dimensional embedding representations you need in NLP, "matrix profile" methods like STUMPY are pretty low-dimensional.

In other words: simpler models for simpler representations.

Now, the practicalities: Image
XGBoost is amazing for this because it's much more capable of learning these types of representations.

99% of the feature engineering you do in NLP nowadays is just tokenization.

In contrast, time-series models benefit a lot from feature engineering.

Here's how:
Time series data has a couple types of patterns. I usually think of them in 3 buckets:

1. Seasonality

These are short-term, repeating patterns. Think of this like day of week effects, month of the year effects, etc.

2. Trends (cont)
Your data is either "trending" up, down, or not at all. Simple as that.

The below image is a commonly-used toy dataset of airline passengers over the years, and it exhibits both seasonality and an upward trend.

If you'll notice, the peaks are consistently ~6 months apart. Image
And finally,

3. Cycles

Cycles are longer range seasons, e.g. years long.

Technically, you could lump cycles in with seasonality. But, IMO, it's helpful to separate them.

My rule of thumb is that 2 year or greater length is a cycle.

So, how do we build models for this?
The short answer is, a lot of features. Most of these features will end up being "autoregressive" features, i.e. features based on the previous values in the time-series.

Then, you combine those features with external variables to your problem, like price.
If you're curious what features end up looking like, I tweeted a bunch the other day.

These features are hard to implement w/o bugs, and it takes a lot of practice to figure out when to them.

Now, to takeaways:
There's so much more to it. In addition to the above, mastery takes:

• Clever feature engineering
• Great EDA (e.g. Pandas) skills
• Great model evaluation skills, specific to time-series

Here's a great resource I put together to learn it all: corise.com/go/forcasting-…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Mark Tenenholtz

Mark Tenenholtz Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @marktenenholtz

Jun 13
XGBoost and LightGBM are the most versatile time-series models.

But, if you want excellent performance from it, you need to tune their hyperparameters well.

A guide to tuning XGBoost/LightGBM for forecasting:
Two quick things:

1. Most of hyperparameter tuning comes down to a good evaluation setup, and time-series models are *hard* to evaluate.

2. XGBoost/LightGBM/CatBoost aren’t as hard to tune as you may think (but we’ll get to that)

First, setting up solid cross-validation:
Time-series evaluation, despite being difficult (more on that later), always is structured the same way.

You always pick a period to train on, and then evaluate on the next several days.

Like this, where the blue line is the train set and the rest is validation. Image
Read 11 tweets
Jun 3
ChatGPT sucks at time-series problems.

Why? I’ve found that it is:

• Bad at the EDA
• Bad at writing efficient Pandas code
• Bad at turning data insights into features

Unfortunately, it’s just not much help here.

Here’s how you can bridge the gap:
1. Using Pandas like a pro

Pandas is essentially built for time-series data, but very few people even use it to even 10% of its potential.

Use it more effectively, and you’ll often find it takes 50% less code to accomplish the same task.
Ask yourself, do you:

• ...make full use of custom indices?
• ...know the most efficient ways to save data?
• ...know how to use .resample() and pd.Grouper()?
• ...know how to fill gaps w/o an expensive cross join?

If you answered no to any, you've got some learning to do!
Read 9 tweets
Jun 1
Everyone should learn to fine-tune LLMs.

You can try (and fail) to force GPT-4 on to every solution, or you can reduce your costs and improve your accuracy by fine-tuning a task-specific LLM.

Here's why, and some tips for getting started:
1. GPT-4 is not a cure-all

GPT-4 is great at a lot of things.

But, for very specific (especially non-generative tasks), it makes little sense to use.

For instance, why are we using generative models for classification problems that have training data?
Don't force your problem to fit a certain format just because it's what Claude/GPT-4/etc. is good at.

Formulate the problem to make it *actually solvable* by a model.
Read 8 tweets
May 24
Time-series forecasting is becoming a lost art.

It’s the engine behind:

• Retail
• Insurance
• Energy pricing
• Fitness tracking
(and more)

Retail alone is a $5 trillion industry in the US!

And yet, the tricks of the trade are locked away.

Here’s what you should know:
1. EDA

The best way to start any ML problem.

Here are some ways to analyze time series data:

• Seasonality (monthly, daily, etc.)
• Trends (big for proper evaluation)
• Autocorrelation
• Diff. between older and newer series
• Tons of raw samples
Don’t forget to overlay other variables than just historical factors, though.

Often what appears to be a random fluctuation has an obvious cause.

Price shifts, advertising campaigns, etc.
Read 19 tweets
May 11
How many people actually understand how LLM agents work?

It's really hard to keep up. Even for experienced practitioners:

• Too much AutoGPT hype
• Langchain is overwhelming at first
• A new paper coming out every week

Here's what you can do to dig in:
Langchain has a bunch of agent implementations, but one of the most popular is called "ReAct."

It's a relatively simple idea (and even simpler to use in Langchain).

In fact, this is all the code you need to build an agent that can search Wikipedia: Image
The problem: there's a lot going on under the hood.

Here's what the model has to go through to answer "Who directed the movie about Alexander Supertramp?"

What if you want to, for example, track the exact prompt at each step?

That's crucial for debugging and learning. Image
Read 8 tweets
May 5
MPT-7B: 64k+ context length, open-source LLM 🔥

Comes w/ 4 checkpoints, including chat+instruction tuned.

• Optimized with FlashAttention
• Beats LLaMA on most benchmarks
• Commercial-use license (sans Chat)
• Can handle up to 84k tokens (thanks to ALiBi)

Thanks @MosaicML!
Announcement here: mosaicml.com/blog/mpt-7b
A reminder for anyone who forgot: Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(