Latest Twitter Threads by @JeffLadish on Thread Reader App

Jun 2 • 8 tweets • 2 min read

An interesting difference between natural selection and reinforcement learning:

Natural selection has to build a mind that learns most things in its environment, but needs the mind maintain some persistent high-level drives, e.g. reproduction... This means natural selection builds in some resistance to reward hacking, especially for animals (e.g. humans) that explore a wide range of environments. You don't a mind that discovers extremely good ways to hunt to become obsessed with hunting at the expense of reproduction

Dec 28, 2024 • 8 tweets • 2 min read

We instructed o1-preview to play to win against Stockfish. Without explicit prompting, o1 figured out it could edit the game state to win against a stronger opponent. GPT-4o and Claude 3.5 required more nudging to figure this out

https://twitter.com/PalisadeAI/status/1872666169515389245

As we train systems directly on solving challenges, they'll get better at routing around all sorts of obstacles, including rules, regulations, or people trying to limit them. This makes sense, but will be a big problem as AI systems get more powerful than the people creating them

Jun 19, 2023 • 17 tweets • 4 min read

I really appreciate that @RishiSunak is explicitly acknowledge the existential and catastrophic risks faced by AI. To have a competent global response we have to start here

Also, accelerating AI development ⏩ is probably the single most dangerous thing you can do in the world

https://twitter.com/RishiSunak/status/1670355987457294337

We're at a pivotal point in time where we have just begun to make AI systems that actually learn and reason, in more and more general ways

This is the beginning of the transition from human cognitive power to AI cognitive power. We have to figure out how to survive this 🔀

Jun 15, 2023 • 17 tweets • 3 min read

People often think AI systems will become kinder or more moral as they get smarter. Indeed as language models have become more capable, they have become nicer and better behaved

Unfortunately, there are strong reasons to think that this niceness is shallow not deep The key question when thinking about future AI systems is whether good behavior is driven by some underlying aligned goal set or whether it's driven by proxy goals that do not generalize, e.g. "get humans to think I'm good and helpful"

May 22, 2023 • 8 tweets • 2 min read

OpenAI just wrote up their plans for how they would like to develop superintelligent AI, and why they think we can't stop development right now.

I'd summarize their approach as "let's proceed to superintelligence with global oversight

openai.com/blog/governanc… First off, it's absolutely wild this is where we're at. The leading AI company in the world is publicly saying they want to build superintelligence in the near future.

Let that sink in

May 21, 2023 • 4 tweets • 1 min read

The more compute we build, the more fuel for an intelligence explosion. I think this is fairly straightforward. Once humans could make industrial amounts of food, it didn't take long to expand to billions. With AI, the expansion will be much, much faster

With humans, there weren't huge amounts of food just lying around ready to be eaten. But there was a huge amount of land that could be quickly converted to farmland at scale. And humans quickly converted it, greatly increasing food supply and ultimately the human population

May 10, 2023 • 4 tweets • 1 min read

AI proliferation makes us all less safe. Seems like a good thing to prevent, and also a pretty difficult challenge!

I would not be that surprised if a state actor managed to get ahold of the OpenAI's frontier models in the next year or two

https://twitter.com/leopoldasch/status/1656340983817330688

https://twitter.com/JeffLadish/status/1656433257515450369

May 10, 2023 • 11 tweets • 3 min read

There is an idea that it's especially valuable to go slow when strong AGI is very close because this is when you'll get the best empirical feedback on your alignment research

The AI systems might be smart enough to be quite useful to study but not so smart as to take control I think this basically correct, and we're currently at that point where we should hit the breaks. Here's why:

1) We're close enough that there's a real chance we could stumble upon strong AGI at any time
2) We're close enough to do lots of useful empirical alignment work

May 9, 2023 • 4 tweets • 1 min read

Love to see interpretability progress, nice work!

https://twitter.com/janleike/status/1655982055736643585

I'm especially excited about approaches that may allow us to automate much of the interpretability work. Seems very good if we can do this reliably

May 5, 2023 • 17 tweets • 4 min read

This document leaked from Google has been gaining attention. Unfortunately it's wrong and right in major ways that should make us seriously reflect on what we're creating

Yes open source models are a huge deal but more open sourcing is NOT the solution

semianalysis.com/p/google-we-ha… First, how is the document correct?

1) It's true that frontier models like GPT-4 can be used to greatly improve the usability / performance of their smaller open source cousins like LLaMA by generating high quality datasets to fine-tune on

May 5, 2023 • 5 tweets • 1 min read

Maximally open source development of AGI is one of the worst possible paths we could take

It's like a nuclear weapon in every household, a bioweapon production facility in every high school lab, chemical weapons too cheap to meter, but somehow worse than all of these combined It's fun now while we're building chat bots but it will be less fun when people are building systems which can learn on their own, self improve, coordinate with each other, and execute complex strategies

Apr 3, 2023 • 11 tweets • 3 min read

Great paper by @sleepinyourhat, "Eight things to know about large language models"

I'm going to break down each point into a tweet for those who want the high level summary, since Sam was too busy doing actual alignment research to make the thread 😉🧵

https://twitter.com/sleepinyourhat/status/1642614846796734464

1. LLMs predictably get more capable with increasing investment, even without targeted innovation

Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: data, parameters, FLOPs

Apr 3, 2023 • 10 tweets • 2 min read

If you think there will be less than five years between human-level science and engineering AGI and superintelligence, I think it makes sense to think that human extinction is by far the most likely outcome. An additional extraordinary thing needs to happen for humans to survive If you don't think achieving superintelligence is possible or likely to occur, then I think there's a much weaker case for AI existential risk

If we have decades between human-level science and engineering AGI and superintelligence then it seems like we have a much better shot

Apr 1, 2023 • 5 tweets • 1 min read

I don't think GPT-4 poses a significant risk of takeover. I think by default GPT-5 probably poses only a small risk but I am not confident about that. Imagining GPT-6 starts to feel like a significant takeover risk

I can't predict how capabilities will scale but that's my guess At some level of base model capability all it takes to build an agent is a prompt, a loop, and a database, which people have shown they're happy to provide. The thing people are doing with GPT-4 could result in literal AI takeover with more powerful base models

Apr 1, 2023 • 6 tweets • 1 min read

I'm worried about a cognitive capability : agency overhang, where we have powerful systems that have little ability to carry out complex plans involving numerous subgoals, but then at some point those powerful non-agentic systems develop complex planning and execution abilities As in, I think a world where GPT-3 starts getting more agentic abilities is safer than a world where GPT-6 starts getting more agentic abilities

Seems like the second world is more likely to lead to a big jump in capabilities

Mar 24, 2023 • 17 tweets • 3 min read

AI takeover is very likely 🧵

This is true even if AI alignment turns out to be relatively easy. I do not think it will be easy, but this would not change the conclusion

All you need to conclude AI takeover is that future AI systems will be very powerful and agentic... There are many different analogies that can illustrate this point. Consider an adult and a toddler. There's no effective way for the toddler can be in control. Sure, the adult can try to give the toddler lots of choices, but at the end of the day it's the adult calling the shots

Mar 23, 2023 • 6 tweets • 2 min read

guys this is wild

https://twitter.com/daniel_eth/status/1638831608424980481

If you've tried to learn to code before and have bounced off, but think you'd like to be able to build some stuff with software like a cool webapp or game...

Consider picking it back up now that we just got 20x better tools!

Mar 23, 2023 • 14 tweets • 3 min read

I have a little story I want to tell you about making a simple tweet composer application in my browser. This is a thing I've been meaning to make for a while, but I haven't written much in javascript...

Fortunately GPT-4 has written plenty... Oh and guess where I composed these tweets! In my little tweet composer app that (chat)GPT-4 and I wrote together 🥰

The basic concept is extremely simple. Just a text box with a 280 character counter underneath with "N characters remaining"

Mar 15, 2023 • 4 tweets • 1 min read

This and also I also think GPT-4 is fascinating, fun, and useful for learning stuff! I recommend people pay the $20 / month to try it so you know what state of the art models *feel* like, and also just because it's useful

https://twitter.com/JeffLadish/status/1635898384707117056

I also recommend donating to alignment or good governance projects to offset the potential harms the $20/month might contribute to (commercial incentive for more scaling). Seems better than not using it

Mar 15, 2023 • 8 tweets • 2 min read

I admit I'm a bit afraid and I don't think that's a bad thing. It's not that GPT-4 is way more powerful than I expected. I loosely expected something similar. But seeing the cognitive jump, I take a step back and look at the trajectory and the compute overhang and I'm scared The simple fact that inference costs *so much less* than training scares me. Human minds aren't like this. Minds don't have to be like this, and I suspect GPU-based minds don't have to be like this either. If true this means way more efficient learning algorithms are out there

Mar 14, 2023 • 8 tweets • 2 min read

A Stanford group has used the recently open-sourced LLaMA
to create a ChatGPT-like instruction-following model

The interesting part is they used GPT 3.5 to generate instruction training data to fine tune LLaMA. We're seeing just the beginning of what model proliferation can do The team says "We are releasing our training recipe and data, and intend to release the model weights in the future"

@StanfordHAI this seems pretty irresponsible, especially when you recognize that you "have not designed adequate safety measures"

Share this page!

Enter URL or ID to Unroll