Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Matt Shumer

@mattshumer_

Sep 5 • 9 tweets • 3 min read • Read on X

I'm excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.

405B coming next week - we expect it to be the best model in the world.

Built w/ @GlaiveAI.

Read on ⬇️:

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close.

The technique that drives Reflection 70B is simple, but very powerful.

Current LLMs have a tendency to hallucinate, and can’t recognize when they do so.

Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer.

Additionally, we separate planning into a separate step, improving CoT potency and keeping the outputs simple and concise for end users.

Important to note: We have checked for decontamination against all benchmarks mentioned using @lmsysorg's LLM Decontaminator.

The weights of our 70B model are available today on @huggingface here:

@hyperbolic_labs API available later today.

Next week, we will release the weights of Reflection-405B, along with a short report going into more detail on our process and findings.huggingface.co/mattshumer/Ref…

Most importantly, a huge shoutout to @csahil28 and @GlaiveAI.

I’ve been noodling on this idea for months, and finally decided to pull the trigger a few weeks ago. I reached out to Sahil and the data was generated within hours.

If you’re training models, check Glaive out.

This model is quite fun to use and insanely powerful.

Please check it out — with the right prompting, it’s an absolute beast for many use-cases.

Demo here: …-playground-production.up.railway.app

405B is coming next week, and we expect it to outperform Sonnet and GPT-4o by a wide margin.

But this is just the start. I have a few more tricks up my sleeve.

I’ll continue to work with @csahil28 to release even better LLMs that make this one look like a toy.

Stay tuned.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @mattshumer_

Matt Shumer

@mattshumer_

Jul 26

Introducing `llama-405b-to-8b` ✍️

Get the quality of Llama 3.1 405B, at a fraction of the cost and latency.

Give one example of your task, and 405B will teach 8B (~30x cheaper!!) how to do the task perfectly.

And it's open-source: github.com/mshumer/gpt-pr…

This was made in partnership with @OctoAICloud — particularly Ben Hamm, who adapted my existing prompt optimization tools to take advantage of the new Llama 3.1 models.

https://x.com/mattshumer_/status/1770823530394833242

This approach was inspired by this tweet that went viral months ago.

I discovered that if you prompt Haiku w/ Opus-generated examples, it can match Opus' quality.

Now, we have even better 'teacher' models than Opus, and cheaper 'student' models than Haiku.

https://x.com/mattshumer_/status/1770823530394833242

Read 8 tweets

Matt Shumer

@mattshumer_

Jul 22

Introducing `claude-sonnet-to-gpt-4o-mini` ✍️

Get the quality of Claude 3.5 Sonnet, at a fraction of the cost and latency.

Give one example of your task, and Sonnet will teach 4o-mini (20x cheaper!!) how to do the task perfectly.

And it's open-source: shorturl.at/Cjjwt

https://x.com/mattshumer_/status/1770823530394833242

This repo was inspired by this tweet that went viral months ago.

I discovered that if you prompt Haiku w/ Opus-generated examples, it can match Opus' quality.

Now, we have even better 'teacher' models than Opus, and cheaper 'student' models than Haiku.

https://x.com/mattshumer_/status/1770823530394833242

In production, Claude 3.5 Sonnet-level AI quality at a low cost, with near-instant results, is a game changer.

This notebook makes it possible for anyone to implement this quickly.

So how does it work?

Read 7 tweets

Matt Shumer

@mattshumer_

Apr 10

Introducing `gemini-youtube-researcher` 📈

An open-source Gemini 1.5 Pro agent that LISTENS to videos and delivers topical reports.

Just provide a topic, and a chain of AIs with access to YouTube will analyze relevant videos and generate a comprehensive report for you.

This uses the new Gemini 1.5 Pro API that was released today.

It currently only supports listening to the audio content of videos. If anyone wants, please feel free to add support for video frames as well.

How it works, in a nutshell:
- User provides a topic
- SERPAPI gathers relevant YouTube links
- A separate Gemini 1.5 instance listens to + summarizes each video
- A final Gemini instance takes in all of the summaries, and generates a final, comprehensive report

Read 4 tweets

Matt Shumer

@mattshumer_

Apr 8

Open-sourcing `AI-Oracle`.

Generates better responses than Claude 3 Opus.

A very simple approach that combines the abilities of Claude 3, GPT-4, and Perplexity to provide better results than any could provide on their own.

Seriously -- it's dumb simple.

Notebook in thread:

How does it work?

The process is super simple. We simply query each model individually:
- Claude 3 Opus for reasoning + personality
- GPT-4 for reasoning
- PPLX for freshness/up-to-date info

Then, Claude combines the strengths of each and responds with a final, ideal output.

It's not perfect, but on average, it should improve results significantly compared to using models individually.

If anyone wants to improve it, there a lot of gains to be made by adding context about the strengths/weaknesses of each model in the final prompt.

Read 4 tweets

Matt Shumer

@mattshumer_

Apr 5

Introducing `claude-researcher` 📈

A powerful Claude 3 research agent that delivers thorough reports in record time.

Just provide an topic, and a chain of AIs with **access to Google** will generate an incredibly comprehensive report for you.

And it's open-source!

`claude-researcher` is a constrained agent -- meaning its behavior is highly-controlled, leading to better results than open-ended agents.

It chains together lots of Claude 3 calls (and Google access) that work together to create a detailed report on a topic of your choice.

How it works, in a nutshell:
- User provides a topic
- Claude breaks it into sub-topics
- An agent with access to Google builds a report for each sub-topic
- A final Claude instance takes in all of the sub-topic reports, and generates a final, comprehensive report

Read 6 tweets

Matt Shumer

@mattshumer_

Apr 3

Introducing `Claude-Author` 📕✍️

One prompt -> an entire novel!

Just describe the high-level details, and a chain of AI systems will write an entire book for you in minutes.

- complete w/ cover art
- packages your book as a real e-book

And it's open-source!

Previous AI book-writing systems produced mildly interesting books that were filled with errors and quite boring.

Claude-Author is the first AI system that actually produces readable books.

Still not perfect, but it's a leaps and bounds improvement over previous approaches.

Claude-Author is a constrained agent -- meaning its behavior is highly-controlled, leading to better results than open-ended agents.

It's a chain of many Claude 3 Haiku/Opus and Stable Diffusion API calls that work together to write a coherent novel.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Matt Shumer

Try unrolling a thread yourself!

More from @mattshumer_

Matt Shumer

Matt Shumer

Matt Shumer

Matt Shumer

Matt Shumer

Matt Shumer

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!