Latest Twitter Threads by @alexandr_wang on Thread Reader App

Apr 10 • 5 tweets • 1 min read

I testified at today's @HouseCommerce hearing on AI.

The CCP has an AI master plan that’s working.

I told Congress that America must do the following to win on AI: Dominate, Unleash, Innovate, Promote.

🧵

1. Dominate:

We need to establish Data Dominance. We can do this by setting up a National AI Data Reserve, making all govt data AI-ready, and investing to position data dominance as a national priority.

Feb 16 • 5 tweets • 3 min read

On the heels of Humanity's Last Exam, @scale_AI & @ai_risks have released a new very-hard reasoning eval:

EnigmaEval: 1,184 multimodal puzzles so hard they take groups of humans many hours to days to solve.

All top models score 0 on the Hard set, and <10% on the Normal set

🧵

Here you can see some sample puzzles:

Dec 12, 2024 • 8 tweets • 2 min read

Since ChatGPT dropped in 2022, AI progress has been dramatic.

But it's also been predictable—new models, bigger chip clusters, more chatbots.

Not in 2025.

Here are the three big changes to watch for over the next 12 months 🧵

1/8 #1 Geopolitical Swing States.

The conversation is going to expand from “Who is leading - the US vs. China?” to “which country’s AI is most exportable worldwide?”

AI-curious countries around the world—“geopolitical swing states”—are going to decide which side they go with

2/8

Nov 5, 2024 • 7 tweets • 3 min read

Scale AI is proud to announce Defense Llama 🇺🇸: the LLM purpose-built for American national security.

This is the product of collaboration between @Meta, Scale, and defense experts, and is available now for integration into US defense systems.

Read more below👇

With the National Security Memorandum coming out of the White House recently, it is clear we need to move fast on AI in national security.

From the NSM:

"If the United States Government does not act with responsible speed and in partnership with industry, civil society, and academia to make use of AI capabilities in service of the national security mission — and to ensure the safety, security, and trustworthiness of American AI innovation writ large — it risks losing ground to strategic competitors."

Sep 16, 2024 • 4 tweets • 2 min read

As LLMs get smarter, evals need to get harder.
OpenAI’s o1 has already maxed out most major benchmarks.

Scale is partnering with CAIS to launch Humanity’s Last Exam: the toughest open-source benchmark for LLMs.

We're putting up $500K in prizes for the best questions.

(read on)

We need tough questions from human experts to push AI models to their limits. If you submit one of the best questions, we’ll give you co-authorship and a share of the prize pot.

The top 50 questions will earn $5,000 each, and the next 500 will earn $500 each. All selected questions grant optional co-authorship on the resulting paper.

We're seeking questions that go beyond undergraduate level and aren't easily answerable via quick online searches.

Aug 1, 2024 • 8 tweets • 2 min read

1/Gemini 1.5 Pro 0801 is the new best model (tops LMSYS, SEAL evals incoming)

Key considerations
1—OpenAI, Google, Anthropic, & Meta all right ON the frontier
2—Google has a long-term compute edge w/TPUs
3—Data & post-training becoming key competitive drivers in performance

🧵 2/We've seen 7 major models from top labs in the last 3mo:

May:
- GPT 4o
- Gemini 1.5 Pro

June:
- Claude 3.5 Sonnet

July:
- Llama 3.1
- Mistral Large 2
- GPT-4o Mini

August:
- Gemini 1.5 0801

Each of these models has been incredibly competitive—each world-class in some way.

Jul 25, 2024 • 8 tweets • 3 min read

1/ New paper in Nature shows model collapse as successive model generations models are recursively trained on synthetic data.

This is an important result. While many researchers today view synthetic data as AI philosopher’s stone, there is no free lunch.

Read more 👇

Training on pure synthetic data has no information gain, thus there is little reason the model *should* improve.

Oftentimes when evals go up from “self-distillation”, that might be from some more invisible tradeoff, i.e. mode collapse in exchange for individual eval improvement

Jun 9, 2024 • 9 tweets • 3 min read

1/ one of the biggest questions in AI today is:

since GPT-4 was trained in fall 2022, we've collectively spent ~$100B on NVIDIA GPUs

will the next generation of AI models' capabilities live up to that aggregate investment level?

NVIDIA qtrly datacenter rev, by @Thomas_Woodside

2/ there are 2 schools of thought:

1) compute is the only real bottleneck to AI progress. the more we spend, the closer we get to AGI

2) we are hitting a data wall which will slow progress regardless of how much compute we have

https://x.com/EpochAIResearch/status/1798742418763981241

May 29, 2024 • 5 tweets • 3 min read

1/ We are launching SEAL Leaderboards—private, expert evaluations of leading frontier models.

Our design principles:
🔒Private + Unexploitable. No overfitting on evals!
🎓Domain Expert Evals
🏆Continuously Updated w/new Data and Models

Read more in 🧵

scale.com/leaderboard

2/ Evaluations are a critical component of the AI ecosystem.

Evals are incentives for researchers, and our evaluations set the goals for how we aim to improve our models.

Trusted 3rd party evals are a missing part of the whole ecosystem, which is why @scale_AI built these.

May 28, 2024 • 9 tweets • 2 min read

1/ Today is the 4th anniversary of the original GPT-3 paper—"Language Models are Few-Shot Learners"

Some reflections on how the last 4 years have played out, and thoughts about the next 4 years 2/ GPT-3 was when it first became clear what the potential of scaling language models was.

The efficacy of GPT-3 took the AI community by surprise for the most part—the capabilities were staggering compared to everything that came before in NLP.

May 16, 2024 • 10 tweets • 2 min read

1/ Some thoughts on the recent OpenAI and Google announcements, and what it indicates about what's next in AI.

Hint: post-training is REALLY important...

THREAD 2/ In many ways, Gemini 1.5 Flash was the gem of Google's announcements. A 1M-context small model with Flash performance is incredible.

OpenAI now has the best large model with GPT-4o, and Google has the best small model with Gemini 1.5 Flash.

The competition is on.

Jan 1, 2024 • 5 tweets • 1 min read

I'm posting some of my learnings from 2023, AI's biggest year yet.

🧵 for some highlights and link to post

LEARNING 1: The conceit of an expert is a trap. Strive for a beginner’s mind and the energy of a novice.

Experience can often be a curse—the past is only mildly predictive of the future, and every scenario requires new techniques and insight. In novel situations, the novice tends to be at an advantage—their vitality and beginner’s mind lend themselves to faster adaptation.

Jul 18, 2023 • 5 tweets • 3 min read

With @MetaAI's the launch of Llama 2—@scale_ai will also be:

🌎 open-sourcing scale-llm-engine, our library for hosting and fine-tuning open-source LLMs
⚡️ releasing the fastest way to fine-tune Llama 2
💼 launching Scale Custom LLMs for enterprises

Read more in 🧵

We are open-sourcing scale-llm-engine, our library for hosting and fine-tuning open-source LLMs.

This can run on your own infra, as well as on Scale's cloud infrastructure.

Docs here:

Github link:
https://t.co/mrguGYVEAoscaleapi.github.io/llm-engine/
github.com/scaleapi/llm-e…

Jan 30, 2023 • 22 tweets • 7 min read

Last week, @scale_AI hosted a Generative AI hackathon.

It was the purest expression of builder energy, and an omen that the dark days of tech stagnation are over.

Roughly ~300 hackers converged @ our office in SF, and a day full of frenzy ensued

🧵 of cool things that happened 🥇Winning 1st place was a project—"GPT is all you need for backend"

Probably the most provocative POV on the future of software—that LLMs will replace backend code too!

https://twitter.com/karpathy/status/1618311660539904002

Jan 16, 2023 • 5 tweets • 4 min read

🧵 thread of some of my favorite AI-generated product images from @scale_AI Forge

AI-generated advertising only gets better as we keep improving our underlying models

It works really well in conveying the feeling of cosmetic products.

Dec 25, 2022 • 5 tweets • 1 min read

Heard someone say “I don’t want to waste brain space on learning Chinese”

PSA—that’s not how it works at all.

Consistently *retrieving* information both deepens connections with the rest of your knowledge and frees up resources & working memory for more abstract thought.

🧵 Memorizing actually allows for new conceptual understanding, it’s not just rote BS.

And while there is some “wetware” limit based on the number of synapses, that limit is roughly the memory size of the movie of your entire life. It’s why some people can have photographic memory

Nov 27, 2022 • 14 tweets • 4 min read

I'm publishing a call to action: The AI War and How to Win It.

AI for national security will define the future of our world. Either the USA wins, or our authoritarian adversaries do.

I walk through The AI War, The China Threat, and How to Win It.

🧵

alexw.substack.com/p/war?sd=pf The Ukraine war demonstrates that the tech stack for war has changed.

The future is clear—AI-powered targeting and autonomous drones will define warfare.

Our legacy military platforms will be disrupted by cheaper autonomous drone fleets.

vox.com/2022/9/21/2335…

Nov 10, 2022 • 5 tweets • 2 min read

1/ We are launching a product we previewed 2 weeks ago—Scale Forge ⚒

We're enabling marketers to AI-generate UNLIMITED and INFINITELY CREATIVE product imagery for:
- brand campaigns
- ad creatives
- social media
- product images

See the product in the video!

Thread 🧵

2/ Scale Forge ⚒ is an AI-powered design studio that enables customers to create new product images that allow for high-fidelity brand preservation.

You can use one of our default products, or upload your own!

Oct 27, 2022 • 6 tweets • 5 min read

I wanted to preview of one of the coolest products from the @scale_AI labs.

We're enabling marketers to AI-generate UNLIMITED and INFINITELY CREATIVE images of their products for:
- ad creatives
- brand campaigns
- social media

Every image in this thread is AI generated🧵

The most creative ads are the ones we remember the best—they're striking, memorable, and cool.

With the new breakthroughs in AI, we can enable brands to unlock their imagination, and grow their customer base.

What are the most inspirational settings for your product?

Oct 24, 2022 • 13 tweets • 5 min read

.@scale_AI had our TransformX conference last week.

As part of that we announced a number of ⚡️new products⚡️ to unlock and operationalize AI for everyone—from startups to researchers and Fortune 500 companies to the US government.

Thread🧵

We announced the ✨Scale Applied AI✨ Suite.

Scale is at the forefront in advancing foundation models, especially applying them to specific tasks + industries.

These are real examples of how Scale is partnering with customers across industries.

May 30, 2022 • 5 tweets • 3 min read

Posting a memo I sent to the @scale_AI team back in 2019.

The core idea is that most organizations fall prey to a slow death of optimism, causing a slow, excruciating halt.

Thread below 👇

alexw.substack.com/p/optimism-sha… The scope, or how long we say something will take, influences how long something takes.

When we say things will take a long time, they will take a long time.

When we say things will take a short amount of time, they will take less time.

Share this page!

Enter URL or ID to Unroll