Post

More from @alexandr_wang

Alexandr Wang

Apr 10

I testified at today's @HouseCommerce hearing on AI.

The CCP has an AI master plan that’s working.

I told Congress that America must do the following to win on AI: Dominate, Unleash, Innovate, Promote.

🧵

1. Dominate:

We need to establish Data Dominance. We can do this by setting up a National AI Data Reserve, making all govt data AI-ready, and investing to position data dominance as a national priority.

2. Unleash:

We must unleash AI technology and establish an Agentic Government. For ex, we should use AI to improve vet healthcare, improve IRS fraud detection, and agency efficiency. We could also require agencies to establish flagship AI programs.

Read 5 tweets

Alexandr Wang

@alexandr_wang

Feb 16

On the heels of Humanity's Last Exam, @scale_AI & @ai_risks have released a new very-hard reasoning eval:

EnigmaEval: 1,184 multimodal puzzles so hard they take groups of humans many hours to days to solve.

All top models score 0 on the Hard set, and <10% on the Normal set

🧵

Here you can see some sample puzzles:

This was only possible thanks to an incredibly brilliant group of puzzle writers and game masters who have created these puzzles over many years.

If you're interested, give some of these puzzles a try!

Read 5 tweets

Alexandr Wang

@alexandr_wang

Dec 12, 2024

Since ChatGPT dropped in 2022, AI progress has been dramatic.

But it's also been predictable—new models, bigger chip clusters, more chatbots.

Not in 2025.

Here are the three big changes to watch for over the next 12 months 🧵

1/8

#1 Geopolitical Swing States.

The conversation is going to expand from “Who is leading - the US vs. China?” to “which country’s AI is most exportable worldwide?”

AI-curious countries around the world—“geopolitical swing states”—are going to decide which side they go with

2/8

The US must win here. Supplying the AI technology of the world is the tech equivalent of being the global reserve currency. It's a 100+-year investment.

AI cannot be China’s next international expansion expedition like the Belts and Roads Initiative.

3/8

Read 8 tweets

Alexandr Wang

@alexandr_wang

Nov 5, 2024

Scale AI is proud to announce Defense Llama 🇺🇸: the LLM purpose-built for American national security.

This is the product of collaboration between @Meta, Scale, and defense experts, and is available now for integration into US defense systems.

Read more below👇

With the National Security Memorandum coming out of the White House recently, it is clear we need to move fast on AI in national security.

From the NSM:

"If the United States Government does not act with responsible speed and in partnership with industry, civil society, and academia to make use of AI capabilities in service of the national security mission — and to ensure the safety, security, and trustworthiness of American AI innovation writ large — it risks losing ground to strategic competitors."

By leveraging the best commercial models for national security, and fine-tuning them specifically for defense and intelligence use cases, we are empowering the US to succeed against our strategic competitors.

There is nothing else more important for the future of freedom.

Read 7 tweets

Alexandr Wang

@alexandr_wang

Sep 16, 2024

As LLMs get smarter, evals need to get harder.
OpenAI’s o1 has already maxed out most major benchmarks.

Scale is partnering with CAIS to launch Humanity’s Last Exam: the toughest open-source benchmark for LLMs.

We're putting up $500K in prizes for the best questions.

(read on)

We need tough questions from human experts to push AI models to their limits. If you submit one of the best questions, we’ll give you co-authorship and a share of the prize pot.

The top 50 questions will earn $5,000 each, and the next 500 will earn $500 each. All selected questions grant optional co-authorship on the resulting paper.

We're seeking questions that go beyond undergraduate level and aren't easily answerable via quick online searches.

If you have 5+ years in a technical field or hold/are pursuing a PhD, we want your insights! We're seeking questions that would truly impress you if an AI could solve them. Help us evaluate how close we are to achieving expert-level AI across diverse domains.

Submit here:

The deadline is November 1, 2024.agi.safe.ai/submit

Read 4 tweets

Alexandr Wang

@alexandr_wang

Aug 1, 2024

1/Gemini 1.5 Pro 0801 is the new best model (tops LMSYS, SEAL evals incoming)

Key considerations
1—OpenAI, Google, Anthropic, & Meta all right ON the frontier
2—Google has a long-term compute edge w/TPUs
3—Data & post-training becoming key competitive drivers in performance

🧵

2/We've seen 7 major models from top labs in the last 3mo:

May:
- GPT 4o
- Gemini 1.5 Pro

June:
- Claude 3.5 Sonnet

July:
- Llama 3.1
- Mistral Large 2
- GPT-4o Mini

August:
- Gemini 1.5 0801

Each of these models has been incredibly competitive—each world-class in some way.

3/The reason these are all so close together timing-wise is that every lab got their H100s at roughly the same time.

They each struggled with early issues with the H100s last fall, and the big H100 clusters all started training this spring.

Voila, 5-6 months later, big models!

Read 8 tweets

Share this page!

Enter URL or ID to Unroll

Alexandr Wang

Try unrolling a thread yourself!

More from @alexandr_wang

Alexandr Wang

Alexandr Wang

Alexandr Wang

Alexandr Wang

Alexandr Wang

Alexandr Wang

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!