Latest Twitter Threads by @mattshumer_ on Thread Reader App

Jul 30 • 5 tweets • 2 min read

Introducing `AutoRL` 📈

The world's simplest way to train a task-specific LLM with RL.

*Just write a SENTENCE describing the model you want.*

A chain of AI systems will generate data + rubrics and train a model for you.

Powered by ART, it's open source.

Link in thread:

How AutoRL works, in a nutshell:

- The user describes the model they want
Ex: "A model that detects spelling and grammar errors"

- OSS models generate a system prompt that will be used a) to generate, and b) for RULER to rank outputs

- We generate input data, and RL a model!

Apr 30 • 10 tweets • 2 min read

<developer_note> is one of the most powerful, yet underutilized prompting techniques.

How (and why) to use it: Oftentimes, just using a system prompt isn't enough to steer a LLM to the degree you want.

Maybe your instructions are complex or super detailed.

Or maybe the LLM's behavior starts to drift once the context window grows.

In these cases, <developer_note> can help you.

Apr 28 • 5 tweets • 2 min read

Asking GPT-4o to judge increasingly terrible business ideas until it finally tells me one is bad…

First up, "Soggy Cereal Café":

Apparently "WhiffBox" has "serious potential".

Feb 3 • 5 tweets • 1 min read

Introducing `OpenDeepResearcher` 🌎

An open-source AI agent that does comprehensive research for you.

Just provide a topic, and the AI will go off, do research, and return a comprehensive report.

How it works:

The approach is really simple.

Given a query, the AI:
- performs searches, views the result pages, and extracts important info
- if it wants to look deeper, it can repeat this process, with new queries
- once it's done, it uses the context to generate a report

That's it!

Feb 1 • 5 tweets • 1 min read

Some initial impressions of o3 mini:

- it’s clear that the benchmarks don’t fully capture how good this model is — it’s clearly the best model I’ve used, for code

- the Cursor team has not figured out how to get it to work well in Composer — ChatGPT gives far better results I will be happily using this over o1/o1 pro.

So far, it has been much more accurate, more capable, and the speed is so nice.

I’ve already replaced my o1 pro bookmark with a link to o3 mini high mode.

Dec 17, 2024 • 17 tweets • 4 min read

Since I've been getting so many requests:

Here's a mega-thread with my most useful o1 / o1 pro prompting tips for coding.

If you get the hang of using these, you'll build much faster and come up with far more elegant solutions! First — why o1? Compared to other models, it's:

- Capable of solving far more complex problems
- More likely to solve on the first shot, without back-and-forth
- Solutions tend to be more elegant and require fewer code changes

Nov 26, 2024 • 9 tweets • 2 min read

Introducing OpenReasoningEngine, an open-source test-time-compute engine that can be used with any OpenAI-compatible model.

Image input, function calling, basic continual learning, + more.

This is an early experiment — there are issues that will need to be ironed out.

Thread:

The engine guides the model to think step-by-step, and at each step, allows it to use code interpreters, web search, etc. to iterate solutions, test approaches, and gather info before responding.

So, when it finally responds, the answer is more likely to be accurate.

Sep 5, 2024 • 9 tweets • 3 min read

I'm excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.

405B coming next week - we expect it to be the best model in the world.

Built w/ @GlaiveAI.

Read on ⬇️:

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close.

Jul 26, 2024 • 8 tweets • 2 min read

Introducing `llama-405b-to-8b` ✍️

Get the quality of Llama 3.1 405B, at a fraction of the cost and latency.

Give one example of your task, and 405B will teach 8B (~30x cheaper!!) how to do the task perfectly.

And it's open-source: github.com/mshumer/gpt-pr…

This was made in partnership with @OctoAICloud — particularly Ben Hamm, who adapted my existing prompt optimization tools to take advantage of the new Llama 3.1 models.

Jul 22, 2024 • 7 tweets • 2 min read

Introducing `claude-sonnet-to-gpt-4o-mini` ✍️

Get the quality of Claude 3.5 Sonnet, at a fraction of the cost and latency.

Give one example of your task, and Sonnet will teach 4o-mini (20x cheaper!!) how to do the task perfectly.

And it's open-source: shorturl.at/Cjjwt

This repo was inspired by this tweet that went viral months ago.

I discovered that if you prompt Haiku w/ Opus-generated examples, it can match Opus' quality.

Now, we have even better 'teacher' models than Opus, and cheaper 'student' models than Haiku.

https://x.com/mattshumer_/status/1770823530394833242

Apr 10, 2024 • 4 tweets • 1 min read

Introducing `gemini-youtube-researcher` 📈

An open-source Gemini 1.5 Pro agent that LISTENS to videos and delivers topical reports.

Just provide a topic, and a chain of AIs with access to YouTube will analyze relevant videos and generate a comprehensive report for you.

This uses the new Gemini 1.5 Pro API that was released today.

It currently only supports listening to the audio content of videos. If anyone wants, please feel free to add support for video frames as well.

Apr 8, 2024 • 4 tweets • 1 min read

Open-sourcing `AI-Oracle`.

Generates better responses than Claude 3 Opus.

A very simple approach that combines the abilities of Claude 3, GPT-4, and Perplexity to provide better results than any could provide on their own.

Seriously -- it's dumb simple.

Notebook in thread:

How does it work?

The process is super simple. We simply query each model individually:
- Claude 3 Opus for reasoning + personality
- GPT-4 for reasoning
- PPLX for freshness/up-to-date info

Then, Claude combines the strengths of each and responds with a final, ideal output.

Apr 5, 2024 • 6 tweets • 2 min read

Introducing `claude-researcher` 📈

A powerful Claude 3 research agent that delivers thorough reports in record time.

Just provide an topic, and a chain of AIs with **access to Google** will generate an incredibly comprehensive report for you.

And it's open-source!

`claude-researcher` is a constrained agent -- meaning its behavior is highly-controlled, leading to better results than open-ended agents.

It chains together lots of Claude 3 calls (and Google access) that work together to create a detailed report on a topic of your choice.

Apr 3, 2024 • 7 tweets • 2 min read

Introducing `Claude-Author` 📕✍️

One prompt -> an entire novel!

Just describe the high-level details, and a chain of AI systems will write an entire book for you in minutes.

- complete w/ cover art
- packages your book as a real e-book

And it's open-source!

Previous AI book-writing systems produced mildly interesting books that were filled with errors and quite boring.

Claude-Author is the first AI system that actually produces readable books.

Still not perfect, but it's a leaps and bounds improvement over previous approaches.

Mar 27, 2024 • 5 tweets • 1 min read

Introducing `claude-llm-trainer` ✍️

The world's simplest way to train a task-specific LLM.

Just write a sentence describing the model you want.

A chain of AI systems will generate a dataset and train a model for you.

And it's open-source.

How it works:

- The user describes the model they want
Ex: "A model that writes Python functions"

- claude-llm-trainer leverages a chain of Claude 3 calls to create a great dataset for your task.

- We process the dataset, and train a LLaMA model!

Mar 25, 2024 • 5 tweets • 2 min read

Introducing `claude-journalist` ✍️

The first Claude 3 journalist agent.

Just provide a topic, and it will:
- Search the web for articles/real-time details
- Choose the best sources and read through them
- Write a fantastic, *factual* article + edit it

And it's open-source!

If you want to try it, you can head to the Github repo in the last tweet in this thread.

But if you don't want to bother with code, I've built an even better + FASTER version into HyperWrite -- try it here: app.hyperwriteai.com/personalassist…

Mar 22, 2024 • 6 tweets • 2 min read

Introducing `claude-investor` 📈

The first Claude 3 investment analyst agent.

Just provide an industry, and it will:
- Find financial data/news for key companies
- Analyze sentiment/trends for each
- Rank stocks by investment potential + price targets

And it's open-source!

`claude-investor` is a constrained agent -- meaning its behavior is highly-controlled, leading to better results than open-ended agents.

It chains together lots of Claude 3 calls that work together to analyze the major stocks in a given category.

Mar 21, 2024 • 7 tweets • 2 min read

Introducing `claude-opus-to-haiku` ✍️

Get the quality of Claude 3 Opus, at a fraction of the cost and latency.

Give one example of your task, and Claude 3 Opus will teach Haiku (60x cheaper!!) how to do the task perfectly.

And it's open-source: github.com/mshumer/gpt-pr…

This repo was inspired by this tweet that went viral.

Claude 3 Haiku is *60x* cheaper than Opus, and 10x faster.

I discovered that if you prompt Haiku with a number of great examples, it can match Opus' quality.

https://x.com/mattshumer_/status/1770823530394833242?s=20

Mar 20, 2024 • 6 tweets • 2 min read

Introducing `claude-prompt-engineer` ✍️

An agent that creates optimal Claude 3 prompts.

Just describe a task, and a chain of AIs will:
- Generate many possible prompts
- Test them in a ranked tournament
- Return the best one

And it's open-source: github.com/mshumer/gpt-pr…

`claude-prompt-engineer` is a constrained agent -- meaning its behavior is highly-controlled, leading to better results than open-ended agents.

It chains together lots of Claude 3 calls that work together to find the best possible prompt.

Oct 19, 2023 • 7 tweets • 3 min read

Introducing the world's most powerful AI Assistant.

Personal Assistant is NOT just another AI chatbot.

It can:
- Operate your browser to actually complete tasks
- Cite sources, so you can trust what it says
- And so much more.

You won't believe what Personal Assistant can do:

Personal Assistant combines everything we've built to create the single most capable Assistant on the planet — from researching, to carrying out tasks for you, and much more.

For example, here is the Assistant writing a well-researched marketing email AND sending it!

Sep 12, 2023 • 8 tweets • 3 min read

Here's a simple guide to set up your OpenAI Playground for day-to-day use, as a (better!) replacement for ChatGPT.

I've been getting so many questions about this, so hopefully this is helpful!

Read on:

https://twitter.com/mattshumer_/status/1700665650492932191

First, why would you want to use the Playground over ChatGPT?

- Greater system prompt/behavior control
- Save multiple system prompts
- Temperature/creativity control
- Longer outputs for reasoning prompts/working with longer text
- Non-nerfed models :)
- Edit all messages

Etc.

Share this page!

Enter URL or ID to Unroll