elvis Profile picture
Dec 5, 2018 9 tweets 3 min read Read on X
A simple method for fair comparison? #NeurIPS2018 Image
Considerations: Image
Reproducibility checklist: Image
There is room for variability, especially when using different distributed systems: Image
Complexity of the world is discarded... We need to tackle RL in the natural world through more complex simulations. Image
Embedding natural background? Image
Set the bar higher for the naturalism of the environment: Image
You learn a lot by considering this idea of stepping out in the real world: Image
Reproducibility test: Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with elvis

elvis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @omarsar0

Jan 23
OpenAI Introduces Operator & Agents!

Here is everything you need to know: Image
Operator is a system that can use a web browser to accomplish tasks.

Operator can look at a webpage and interact with it by typing, clicking, and scrolling.

It's available as a research preview. Available in the US for Pro users. Available to Plus users later.
Operator can perform a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes.
Read 16 tweets
Jan 21
Goodbye web scrapers!

Say hello to /extract by @firecrawl_dev

Just write a prompt and get the web data you need!

It doesn’t get any simpler than this.
The /extract endpoint is simple to use. Provide a prompt and a schema and retrieve any data you need from a website.

I’ve added the /* to the URL to find and extract information across the entire website.

The endpoint can return up to thousands of data points at once.
What companies are already using /extract for:

- Enrich CRM data
- Streamline KYB processes
- Monitor competitors
- Supercharge onboarding experiences
- Build targeted prospecting lists

Examples here:
Read 4 tweets
Jan 20
The DeepSeek-R1 paper is a gem!

Highly encourage everyone to read it.

It's clear that LLM reasoning capabilities can be learned in different ways.

RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties.

There is more to RL than meets the eye!

Here is my breakdown of the paper along with a few tests: youtu.be/3GlFd3doO3U?si…

The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into.

Data quality is still very important for enhancing the usability of the LLM.

Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities.

About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.
When DeepSeek said "on par with OpenAI-o1" I thought they were just hyping. But based on my tests, it's clearly not so.

Wanted to add that DeepSeek-R1 got all of the hard tasks from the OpenAI LLM reasoning blog post correct for me. This is wild and totally unexpected! The only task where it failed (i.e., crossword puzzle) o1 also fails.Image
multi-state training -> multi-stage training

It means a couple of rounds of RL and fine-tuning. This leads to a model that is not only good at complex reasoning but is also aligned and usable in a real-world setting.

If you used their preview model, it definitely felt like it lacked the human preference alignment part which they somehow figured out in this release through the "RL for all scenarios" step explained in the paper.
Read 4 tweets
Jan 8
Agents Overview

Great write-up on Agents by Chip.

Here are my takeaways: Image
🤖 Agents Overview

An AI agent is made up of both the environment it operates in (e.g., a game, the internet, or computer system) and the set of actions it can perform through its available tools. This dual definition is fundamental to understanding how agents work.
👨‍💻 Agent Example

The figure shows an example of an agent built on top of GPT-4. The environment is the computer which has access to a terminal and filesystem. The set of action include navigate, searching files, viewing files, etc. Image
Read 14 tweets
Jan 6
Google recently published this great whitepaper on Agents.

2025 is going to be a huge year for AI Agents.

Here's what's included:

- Introduction to AI Agents
- The role of tools in Agents
- Enhancing model performance with targeted learning
- Quick start to Agents with LangChain
- Production applications with Vertex AI Agents

Great place to start learning about AI Agents.Image
If you want to take it a step further, check out my new course on building AI Agents for different use cases: dair-ai.thinkific.com/courses/introd…
Read 13 tweets
Dec 17, 2024
Summary of today's OpenAI announcement (Day 9):

- o1 is launching out of preview in the API
- support for function calling, structured output, and developer messages
- reasoning_effort parameter to tell the model how much effort to spend on thinking
- vision inputs in the API is here too
Visual inputs with developer message (this is a new spin to system message for better steering the model) inside of the OpenAI Playground Image
Cool to see support for function calling and response format for o1 Image
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(