elvis Profile picture
Dec 11, 2019 5 tweets 2 min read Read on X
Talk on Social Intelligence by Blaise is getting started now at West Hall C. #NeurIPS2019 Image
Let’s start the conversation around of energy and natural resources Image
The importance of federated learning in ML Image
Better loss functions? Image
Grand challenges Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with elvis

elvis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @omarsar0

Oct 16
I am not going to lie.

I see a lot of potential in the Skills feature that Anthropic just dropped!

Just tested with Claude Code. It leads to sharper and precise outputs.

It's structured context engineering to power CC with specialized capabilities, leveraging the filesystem. Image
I think it might be one of the best ways to really tap into the full potential of Claude Code.

Tune instructions, output formats, use of scripts, tools (MCP or otherwise), and more.

For specialized tasks, CC outputs dumb stuff at times; the idea here is to scope CC on demand.
An easy way to try Skills in Claude Code is by asking it to help you build one. I am surprised by how aware it is of Skills and how to build comprehensive ones.
Read 7 tweets
Oct 16
Banger paper from Meta and collaborators.

This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs.

The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that consistently works as you scale up compute.

Think of it as a practical guide for anyone trying to train reasoning or alignment models with RL.

More on why this is a big deal:Image
1. The big insight: RL progress follows a predictable curve.

When you plot model performance vs compute, the growth isn’t random; it follows a sigmoid (S-shaped) curve.

The curve has three simple knobs:
A = the best performance you’ll ever reach,
B = how efficiently you reach it,
C_mid = how much compute it takes to hit the halfway point.

The amazing part: you can fit this curve using just small runs and accurately predict how a 100k-hour run will behave.

So you no longer need to guess; you can forecast where your RL setup will top out before burning compute.Image
2. The ScaleRL recipe that just works.

The authors tested dozens of RL variations and found one that scales cleanly to 100k GPU hours without blowing up:

- PipelineRL (8 pipelines) with CISPO loss (a stabilized REINFORCE variant).

- Prompt-level averaging and batch-level normalization to reduce variance.

- FP32 logits for better stability and higher final accuracy.

- No-Positive-Resampling curriculum to avoid reward hacking.

- Forced interruptions (stopping long thoughts) instead of punishing long completions.

- This combo, called ScaleRL, hit the best trade-off between stability, sample efficiency, and asymptotic performance.Image
Read 7 tweets
Sep 30
We are living in the most insane timeline.

I just asked Claude Code (with Claude Sonnet 4.5) to develop an MCP Server (end-to-end) that allows me to programatically create n8n workflows from within Claude Code itself.

Took about 10 mins!
You can now create n8n workflows with pure natural language from Claude Code.

This is one of the top requests in our academy: how to automate the creation of n8n workflows.

It turns out that this is a great use case for MCP.
I've already created a huge repository of n8n agentic workflows, which I can now feed directly to Claude Code to help scale the creation of workflows.

It can even create/optimize prompts and all that good stuff. Automating context engineering is next, which Claude Code is really good at, too.
Read 6 tweets
Sep 28
Great work showing prompt synthesis as a new scaling axis for reasoning.

Good training data is scarce.

This work showcases a framework that might make it possible to construct high-quality training problems for reasoning-focused LLMs.

Technical details below: Image
This work shows that we can scale reasoning ability in LLMs by automatically generating hard, high-quality prompts instead of relying only on human-written datasets. Image
Core idea: Treat explanations (“rationales”) as hidden variables. The system learns to generate concept → explanation → problem using an EM loop. A strong model provides initial seed problems, then the loop keeps improving quality. Image
Read 7 tweets
Sep 25
Language Models that Think and Chat Better

Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking).

Pay attention to this one, AI devs!

Here are my notes: Image
TL;DR

A simple recipe, RL with Model-rewarded Thinking (RLMT), makes small open models “plan first, answer second” on regular chat prompts and trains them with online RL against a preference reward.

They find that long, explicit thinking paired with a strong preference reward generalizes beyond verifiable domains.Image
What’s new

Instead of rule-verifiable rewards (math, code), RLMT uses long chain-of-thought on diverse real-world prompts plus a reward model (Skywork) to score outputs, trained with online RL (GRPO, PPO, DPO). Image
Read 7 tweets
Sep 22
Very cool work from Meta Superintelligence Lab.

They are open-sourcing Meta Agents Research Environments (ARE), the platform they use to create and scale agent environments.

Great resource to stress-test agents in environments closer to real apps.

Read on for more: Image
TL;DR

ARE + Gaia2: a research platform and benchmark for building and stress-testing agent systems in realistic, time-driven environments.

The paper introduces a modular simulator (ARE) and a mobile-style benchmark (Gaia2) that emphasize asynchronous events, verification of write actions, and multi-agent coordination in noisy, dynamic settings.Image
ARE: the simulator

• Everything is modeled as apps, events, notifications, and scenarios.

• Time keeps flowing even while the agent is thinking, so slow models miss deadlines.

•Agents use tools, get async notifications, and operate under rules defined by directed acyclic graphs.Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(