Latest Twitter Threads by @marktenenholtz on Thread Reader App

Mar 3, 2024 • 6 tweets • 2 min read

I just told ChatGPT to write a lot of code and then walked away to pet my dogs, only to come back to the correct result. Yeah, we're living in the future.

It really doesn't matter if AI automates most of writing code. The most important thing has always been figuring out what code should be written! It doesn't write better code than me, but who cares? It's probably faster to refactor a bit than writing everything from scratch, unless I really know exactly what I want

Aug 7, 2023 • 5 tweets • 1 min read

LLM sizes, and when to use them:

100M-500M param, encoder-only: you have a straightforward classification/regression task, or you need local embeddings for reasonably sized datasets.

1-3B: you want to fine tune a resource-constrained model, usually a very specific task.

(cont) 7B: sweet spot for small models. You have a pretty specific task, but you need a little more zero-shot/in-context learning than 1-3B param models provide. Still not great reasoning capabilities, though.

13B: a bit unclear to me so far, tbh. Need more Llama 2 results.

Jul 31, 2023 • 4 tweets • 1 min read

Best AI advice I can give is to learn how to fine-tune models.

2 years ago, training a model with >1B parameters was a pipe dream for most.

Now, you can:

• Train a 65B model on a single GPU
• Generate w/ 30B+ param models on the CPU

But best of all… (cont)

Now there are much smaller models that beat GPT-3, and some models that arguably (ARGUABLY) come close to GPT-3.5-turbo.

The architecture of GPT-4 tells me that, despite scaling laws, we are starting to max out what we can practically deploy.

So what does that mean for you?

Jul 31, 2023 • 9 tweets • 2 min read

Python is removing the GIL.

The GIL (Global Interpreter Lock) prevents you from running multi-threaded code.

That makes ML code, in particular, really hard to write in pure Python.

Here's what it takes to remove the GIL: Removing the GIL has been discussed for a long time now.

And now finally, PEP 703 (a very ML-focused proposal) was passed. Meta has already committed a lot of engineer-hours to it.

First of all, if you want to see the actual proposal, look here:

https://twitter.com/marktenenholtz/status/1685654899575427072

Jul 30, 2023 • 5 tweets • 1 min read

Misconceptions about XGBoost:

1. It's hard to tune the hyperparameters

Disagree, there's only ~5-6 to tune, and 3-5 of them can be tuned very quickly. Learning rate is probably the one that requires the most care.

2. Inability to extrapolate is a problem

(cont) First of all, ML models, whether XGBoost or an LLM, generally can't extrapolate well (see the gzip paper).

Second, the problem can usually be reframed to avoid this problem (i.e. predicting the different in targets as opposed to the raw value for an increasing time-series).

Jul 20, 2023 • 9 tweets • 2 min read

Data Validation in ML Pipelines

It’s so hard to get good advice on how to do this.

But @weights_biases just released a free course taught by @sh_reya on exactly that.

Here’s a breakdown: Right off the bat, Shreya identifies something I've beat the drum on for a while.

"Data drift" is not just one thing, it's many concepts rolled up into one term.

This is a diagram I use in my course, but I appreciate how Shreya tackled it.

Of course, she takes this further:

Jun 13, 2023 • 11 tweets • 3 min read

XGBoost and LightGBM are the most versatile time-series models.

But, if you want excellent performance from it, you need to tune their hyperparameters well.

A guide to tuning XGBoost/LightGBM for forecasting: Two quick things:

1. Most of hyperparameter tuning comes down to a good evaluation setup, and time-series models are *hard* to evaluate.

2. XGBoost/LightGBM/CatBoost aren’t as hard to tune as you may think (but we’ll get to that)

First, setting up solid cross-validation:

Jun 12, 2023 • 15 tweets • 4 min read

Time-series forecasting is becoming a lost art.

It’s the engine behind the $5,000,000,000,000 retail industry in the US.

And yet, research progress is slow and the best tricks are locked away.

But why is it particularly tough even in the current AI breakthrough?

Here’s why 🧵 Forecasting is a seemingly straightforward task.

In fact, given all the success we've had with LLMs, it's weird that we can't use the exact same tricks with time-series data.

So why don't 175B parameter models work, and why can't we do the classic LM + transfer learning recipe?

Jun 3, 2023 • 9 tweets • 2 min read

ChatGPT sucks at time-series problems.

Why? I’ve found that it is:

• Bad at the EDA
• Bad at writing efficient Pandas code
• Bad at turning data insights into features

Unfortunately, it’s just not much help here.

Here’s how you can bridge the gap: 1. Using Pandas like a pro

Pandas is essentially built for time-series data, but very few people even use it to even 10% of its potential.

Use it more effectively, and you’ll often find it takes 50% less code to accomplish the same task.

Jun 1, 2023 • 8 tweets • 3 min read

Everyone should learn to fine-tune LLMs.

You can try (and fail) to force GPT-4 on to every solution, or you can reduce your costs and improve your accuracy by fine-tuning a task-specific LLM.

Here's why, and some tips for getting started: 1. GPT-4 is not a cure-all

GPT-4 is great at a lot of things.

But, for very specific (especially non-generative tasks), it makes little sense to use.

For instance, why are we using generative models for classification problems that have training data?

May 24, 2023 • 19 tweets • 3 min read

Time-series forecasting is becoming a lost art.

It’s the engine behind:

• Retail
• Insurance
• Energy pricing
• Fitness tracking
(and more)

Retail alone is a $5 trillion industry in the US!

And yet, the tricks of the trade are locked away.

Here’s what you should know: 1. EDA

The best way to start any ML problem.

Here are some ways to analyze time series data:

• Seasonality (monthly, daily, etc.)
• Trends (big for proper evaluation)
• Autocorrelation
• Diff. between older and newer series
• Tons of raw samples

May 11, 2023 • 8 tweets • 4 min read

How many people actually understand how LLM agents work?

It's really hard to keep up. Even for experienced practitioners:

• Too much AutoGPT hype
• Langchain is overwhelming at first
• A new paper coming out every week

Here's what you can do to dig in: Langchain has a bunch of agent implementations, but one of the most popular is called "ReAct."

It's a relatively simple idea (and even simpler to use in Langchain).

In fact, this is all the code you need to build an agent that can search Wikipedia:

May 5, 2023 • 4 tweets • 2 min read

MPT-7B: 64k+ context length, open-source LLM 🔥

Comes w/ 4 checkpoints, including chat+instruction tuned.

• Optimized with FlashAttention
• Beats LLaMA on most benchmarks
• Commercial-use license (sans Chat)
• Can handle up to 84k tokens (thanks to ALiBi)

Thanks @MosaicML! Announcement here: mosaicml.com/blog/mpt-7b

May 1, 2023 • 12 tweets • 4 min read

How many people truly know how computer memory works?

The next five years will be bonkers. But it might be tough to follow if you don't know what's going on under the hood.

A great place to start is learning how memory works, from the ground up.

Here's how you can start: You probably already know that we somehow took a bunch of 1's and 0's and used them to program the computer/phone you're reading this on.

Accomplishing that requires a lot of abstraction and complexity.

Whenever I'm learning a topic like that, I prefer to keep it visual.

Apr 27, 2023 • 23 tweets • 6 min read

I built a ChatGPT app that lets you chat with any codebase!

99% of projects just copy/paste Langchain tutorials. This goes well beyond that.

Here's how I built it: I built it to work with the Twitter codebase, but it's effortless to swap in any other repository.

1. Embedding the code
2. Query + prompt the model
3. Pull in relevant context
4. Repeat steps 2+3

That's all created with a Pinecone vector DB and a FastAPI backend.

Let's start!

Apr 26, 2023 • 9 tweets • 3 min read

Hard truth:

Simple LLM search methods get crushed by traditional lexical search.

LLMs crash and burn on out-of-domain data, which is why the current state-of-the-art is a hybrid method.

The fundamentals of lexical search are still crucial to understand.

Here's where to start: The BEIR benchmark is the gold standard for evaluating search methods.

One of the most commonly used lexical search methods (BM25) beats every other transformer they benchmarked, except ColBERT.

But, by far the best method was BM25 plus a cross-encoder LM.

It works like this:

Apr 25, 2023 • 21 tweets • 5 min read

I am at least 3-5x more productive using ChatGPT to code.

Not only am I faster writing code I'm familiar with, but I've even shipped apps in tech stacks I'd never used before.

Here's my process, the prompts I use, and why it all works: 1. Start in ChatGPT

ChatGPT is really good at the basics and getting setup out of the way.

These steps are the most abundant in its training data, since they're pretty common across projects

My prompt is usually something like:

Apr 24, 2023 • 4 tweets • 2 min read

TiDE: Time-series Dense Encoder

Claims to be 5-10x faster (and 6% more accurate) than Transformers (PatchTST) with much longer history due to linear scaling.

It’s an MLP-based encoder-decoder architecture that leverages a custom distribution loss. 👇

The most common loss function in time-series models is MSE.

But we figured out a while ago that directly optimizing the distribution of your data using MLE is better.

Zero-inflated losses like Tweedie can be great for tasks like item count forecasting.

TiDE is a bit fancier.

Apr 22, 2023 • 4 tweets • 1 min read

"Kaggle doesn't translate to real life" says the data scientist fitting models with an AutoML tool designed by a team full of Kaggle Grandmasters. "Kaggle doesn't translate to real life" says the data scientist who learned 3 days ago that SVMs on embeddings work really well.

Apr 21, 2023 • 7 tweets • 2 min read

For my next open-source app, I was inspired by @karpathy to do a simple movie recommender using OpenAI's embeddings.

A couple quick observations: Right now, it's just matching based on their plot summaries.

(Caveat: I'm testing on a small sample size of a few hundred movies)

1. Sometimes, it clusters based on undesirable attributes

Apr 19, 2023 • 14 tweets • 4 min read

I've seen 1,000,000 LLM demos and "AI influencer" posts.

But none of them showed me how to actually deploy a real app.

So, I built a project (open source) to figure it out, and it only uses 1 CPU and 1 GB RAM.

Here's how I built a simple ChatGPT streaming backend in FastAPI: This isn't a perfect backend. In fact, it can be significantly improved.

But, it allows me to use an extremely cheap VM to host it.

My traffic hasn't been much, but I also haven't even exceeded 1% CPU usage at any point.

Look closely and you'll see the utilization 😂

Share this page!

Enter URL or ID to Unroll