Mark Tenenholtz Profile picture
Head of AI @PredeloHQ. XGBoost peddler, transformer purveyor.
Sercan Ahi Profile picture 𝔽_un Profile picture Zach Bessinger Profile picture Bharat Raghunathan (@bharatr@sigmoid.social) Profile picture GollyG 💙 Profile picture 9 subscribed
Mar 3 6 tweets 2 min read
I just told ChatGPT to write a lot of code and then walked away to pet my dogs, only to come back to the correct result. Yeah, we're living in the future.

It really doesn't matter if AI automates most of writing code. The most important thing has always been figuring out what code should be written! It doesn't write better code than me, but who cares? It's probably faster to refactor a bit than writing everything from scratch, unless I really know exactly what I want
Aug 7, 2023 5 tweets 1 min read
LLM sizes, and when to use them:

100M-500M param, encoder-only: you have a straightforward classification/regression task, or you need local embeddings for reasonably sized datasets.

1-3B: you want to fine tune a resource-constrained model, usually a very specific task.

(cont) 7B: sweet spot for small models. You have a pretty specific task, but you need a little more zero-shot/in-context learning than 1-3B param models provide. Still not great reasoning capabilities, though.

13B: a bit unclear to me so far, tbh. Need more Llama 2 results.
Jul 31, 2023 4 tweets 1 min read
Best AI advice I can give is to learn how to fine-tune models.

2 years ago, training a model with >1B parameters was a pipe dream for most.

Now, you can:

• Train a 65B model on a single GPU
• Generate w/ 30B+ param models on the CPU

But best of all… (cont) Image Now there are much smaller models that beat GPT-3, and some models that arguably (ARGUABLY) come close to GPT-3.5-turbo.

The architecture of GPT-4 tells me that, despite scaling laws, we are starting to max out what we can practically deploy.

So what does that mean for you?
Jul 31, 2023 9 tweets 2 min read
Python is removing the GIL.

The GIL (Global Interpreter Lock) prevents you from running multi-threaded code.

That makes ML code, in particular, really hard to write in pure Python.

Here's what it takes to remove the GIL: Removing the GIL has been discussed for a long time now.

And now finally, PEP 703 (a very ML-focused proposal) was passed. Meta has already committed a lot of engineer-hours to it.

First of all, if you want to see the actual proposal, look here:
Jul 30, 2023 5 tweets 1 min read
Misconceptions about XGBoost:

1. It's hard to tune the hyperparameters

Disagree, there's only ~5-6 to tune, and 3-5 of them can be tuned very quickly. Learning rate is probably the one that requires the most care.

2. Inability to extrapolate is a problem

(cont) First of all, ML models, whether XGBoost or an LLM, generally can't extrapolate well (see the gzip paper).

Second, the problem can usually be reframed to avoid this problem (i.e. predicting the different in targets as opposed to the raw value for an increasing time-series).
Jul 20, 2023 9 tweets 2 min read
Data Validation in ML Pipelines

It’s so hard to get good advice on how to do this.

But @weights_biases just released a free course taught by @sh_reya on exactly that.

Here’s a breakdown: Right off the bat, Shreya identifies something I've beat the drum on for a while.

"Data drift" is not just one thing, it's many concepts rolled up into one term.

This is a diagram I use in my course, but I appreciate how Shreya tackled it.

Of course, she takes this further: Image
Jun 13, 2023 11 tweets 3 min read
XGBoost and LightGBM are the most versatile time-series models.

But, if you want excellent performance from it, you need to tune their hyperparameters well.

A guide to tuning XGBoost/LightGBM for forecasting: Two quick things:

1. Most of hyperparameter tuning comes down to a good evaluation setup, and time-series models are *hard* to evaluate.

2. XGBoost/LightGBM/CatBoost aren’t as hard to tune as you may think (but we’ll get to that)

First, setting up solid cross-validation:
Jun 12, 2023 15 tweets 4 min read
Time-series forecasting is becoming a lost art.

It’s the engine behind the $5,000,000,000,000 retail industry in the US.

And yet, research progress is slow and the best tricks are locked away.

But why is it particularly tough even in the current AI breakthrough?

Here’s why 🧵 Forecasting is a seemingly straightforward task.

In fact, given all the success we've had with LLMs, it's weird that we can't use the exact same tricks with time-series data.

So why don't 175B parameter models work, and why can't we do the classic LM + transfer learning recipe?
Jun 3, 2023 9 tweets 2 min read
ChatGPT sucks at time-series problems.

Why? I’ve found that it is:

• Bad at the EDA
• Bad at writing efficient Pandas code
• Bad at turning data insights into features

Unfortunately, it’s just not much help here.

Here’s how you can bridge the gap: 1. Using Pandas like a pro

Pandas is essentially built for time-series data, but very few people even use it to even 10% of its potential.

Use it more effectively, and you’ll often find it takes 50% less code to accomplish the same task.
Jun 1, 2023 8 tweets 3 min read
Everyone should learn to fine-tune LLMs.

You can try (and fail) to force GPT-4 on to every solution, or you can reduce your costs and improve your accuracy by fine-tuning a task-specific LLM.

Here's why, and some tips for getting started: 1. GPT-4 is not a cure-all

GPT-4 is great at a lot of things.

But, for very specific (especially non-generative tasks), it makes little sense to use.

For instance, why are we using generative models for classification problems that have training data?
May 24, 2023 19 tweets 3 min read
Time-series forecasting is becoming a lost art.

It’s the engine behind:

• Retail
• Insurance
• Energy pricing
• Fitness tracking
(and more)

Retail alone is a $5 trillion industry in the US!

And yet, the tricks of the trade are locked away.

Here’s what you should know: 1. EDA

The best way to start any ML problem.

Here are some ways to analyze time series data:

• Seasonality (monthly, daily, etc.)
• Trends (big for proper evaluation)
• Autocorrelation
• Diff. between older and newer series
• Tons of raw samples
May 11, 2023 8 tweets 4 min read
How many people actually understand how LLM agents work?

It's really hard to keep up. Even for experienced practitioners:

• Too much AutoGPT hype
• Langchain is overwhelming at first
• A new paper coming out every week

Here's what you can do to dig in: Langchain has a bunch of agent implementations, but one of the most popular is called "ReAct."

It's a relatively simple idea (and even simpler to use in Langchain).

In fact, this is all the code you need to build an agent that can search Wikipedia: Image
May 5, 2023 4 tweets 2 min read
MPT-7B: 64k+ context length, open-source LLM 🔥

Comes w/ 4 checkpoints, including chat+instruction tuned.

• Optimized with FlashAttention
• Beats LLaMA on most benchmarks
• Commercial-use license (sans Chat)
• Can handle up to 84k tokens (thanks to ALiBi)

Thanks @MosaicML! Announcement here: mosaicml.com/blog/mpt-7b
May 1, 2023 12 tweets 4 min read
How many people truly know how computer memory works?

The next five years will be bonkers. But it might be tough to follow if you don't know what's going on under the hood.

A great place to start is learning how memory works, from the ground up.

Here's how you can start: You probably already know that we somehow took a bunch of 1's and 0's and used them to program the computer/phone you're reading this on.

Accomplishing that requires a lot of abstraction and complexity.

Whenever I'm learning a topic like that, I prefer to keep it visual.
Apr 27, 2023 23 tweets 6 min read
I built a ChatGPT app that lets you chat with any codebase!

99% of projects just copy/paste Langchain tutorials. This goes well beyond that.

Here's how I built it: I built it to work with the Twitter codebase, but it's effortless to swap in any other repository.

1. Embedding the code
2. Query + prompt the model
3. Pull in relevant context
4. Repeat steps 2+3

That's all created with a Pinecone vector DB and a FastAPI backend.

Let's start!
Apr 26, 2023 9 tweets 3 min read
Hard truth:

Simple LLM search methods get crushed by traditional lexical search.

LLMs crash and burn on out-of-domain data, which is why the current state-of-the-art is a hybrid method.

The fundamentals of lexical search are still crucial to understand.

Here's where to start: The BEIR benchmark is the gold standard for evaluating search methods.

One of the most commonly used lexical search methods (BM25) beats every other transformer they benchmarked, except ColBERT.

But, by far the best method was BM25 plus a cross-encoder LM.

It works like this: Image
Apr 25, 2023 21 tweets 5 min read
I am at least 3-5x more productive using ChatGPT to code.

Not only am I faster writing code I'm familiar with, but I've even shipped apps in tech stacks I'd never used before.

Here's my process, the prompts I use, and why it all works: 1. Start in ChatGPT

ChatGPT is really good at the basics and getting setup out of the way.

These steps are the most abundant in its training data, since they're pretty common across projects

My prompt is usually something like: Image
Apr 24, 2023 4 tweets 2 min read
TiDE: Time-series Dense Encoder

Claims to be 5-10x faster (and 6% more accurate) than Transformers (PatchTST) with much longer history due to linear scaling.

It’s an MLP-based encoder-decoder architecture that leverages a custom distribution loss. 👇 Image The most common loss function in time-series models is MSE.

But we figured out a while ago that directly optimizing the distribution of your data using MLE is better.

Zero-inflated losses like Tweedie can be great for tasks like item count forecasting.

TiDE is a bit fancier.
Apr 22, 2023 4 tweets 1 min read
"Kaggle doesn't translate to real life" says the data scientist fitting models with an AutoML tool designed by a team full of Kaggle Grandmasters. "Kaggle doesn't translate to real life" says the data scientist who learned 3 days ago that SVMs on embeddings work really well.
Apr 21, 2023 7 tweets 2 min read
For my next open-source app, I was inspired by @karpathy to do a simple movie recommender using OpenAI's embeddings.

A couple quick observations: Right now, it's just matching based on their plot summaries.

(Caveat: I'm testing on a small sample size of a few hundred movies)

1. Sometimes, it clusters based on undesirable attributes Image
Apr 19, 2023 14 tweets 4 min read
I've seen 1,000,000 LLM demos and "AI influencer" posts.

But none of them showed me how to actually deploy a real app.

So, I built a project (open source) to figure it out, and it only uses 1 CPU and 1 GB RAM.

Here's how I built a simple ChatGPT streaming backend in FastAPI: This isn't a perfect backend. In fact, it can be significantly improved.

But, it allows me to use an extremely cheap VM to host it.

My traffic hasn't been much, but I also haven't even exceeded 1% CPU usage at any point.

Look closely and you'll see the utilization 😂 Image