Jeffrey Ladish Profile picture
Applying the security mindset to everything
2 subscribers
Jun 19, 2023 17 tweets 4 min read
I really appreciate that @RishiSunak is explicitly acknowledge the existential and catastrophic risks faced by AI. To have a competent global response we have to start here

Also, accelerating AI development ⏩ is probably the single most dangerous thing you can do in the world We're at a pivotal point in time where we have just begun to make AI systems that actually learn and reason, in more and more general ways

This is the beginning of the transition from human cognitive power to AI cognitive power. We have to figure out how to survive this 🔀
Jun 15, 2023 17 tweets 3 min read
People often think AI systems will become kinder or more moral as they get smarter. Indeed as language models have become more capable, they have become nicer and better behaved

Unfortunately, there are strong reasons to think that this niceness is shallow not deep The key question when thinking about future AI systems is whether good behavior is driven by some underlying aligned goal set or whether it's driven by proxy goals that do not generalize, e.g. "get humans to think I'm good and helpful"
May 22, 2023 8 tweets 2 min read
OpenAI just wrote up their plans for how they would like to develop superintelligent AI, and why they think we can't stop development right now.

I'd summarize their approach as "let's proceed to superintelligence with global oversight

openai.com/blog/governanc… First off, it's absolutely wild this is where we're at. The leading AI company in the world is publicly saying they want to build superintelligence in the near future.

Let that sink in
May 21, 2023 4 tweets 1 min read
The more compute we build, the more fuel for an intelligence explosion. I think this is fairly straightforward. Once humans could make industrial amounts of food, it didn't take long to expand to billions. With AI, the expansion will be much, much faster Image With humans, there weren't huge amounts of food just lying around ready to be eaten. But there was a huge amount of land that could be quickly converted to farmland at scale. And humans quickly converted it, greatly increasing food supply and ultimately the human population
May 10, 2023 4 tweets 1 min read
AI proliferation makes us all less safe. Seems like a good thing to prevent, and also a pretty difficult challenge!

I would not be that surprised if a state actor managed to get ahold of the OpenAI's frontier models in the next year or two
May 10, 2023 11 tweets 3 min read
There is an idea that it's especially valuable to go slow when strong AGI is very close because this is when you'll get the best empirical feedback on your alignment research

The AI systems might be smart enough to be quite useful to study but not so smart as to take control I think this basically correct, and we're currently at that point where we should hit the breaks. Here's why:

1) We're close enough that there's a real chance we could stumble upon strong AGI at any time
2) We're close enough to do lots of useful empirical alignment work
May 9, 2023 4 tweets 1 min read
Love to see interpretability progress, nice work! I'm especially excited about approaches that may allow us to automate much of the interpretability work. Seems very good if we can do this reliably
May 5, 2023 17 tweets 4 min read
This document leaked from Google has been gaining attention. Unfortunately it's wrong and right in major ways that should make us seriously reflect on what we're creating

Yes open source models are a huge deal but more open sourcing is NOT the solution

semianalysis.com/p/google-we-ha… First, how is the document correct?

1) It's true that frontier models like GPT-4 can be used to greatly improve the usability / performance of their smaller open source cousins like LLaMA by generating high quality datasets to fine-tune on
May 5, 2023 5 tweets 1 min read
Maximally open source development of AGI is one of the worst possible paths we could take

It's like a nuclear weapon in every household, a bioweapon production facility in every high school lab, chemical weapons too cheap to meter, but somehow worse than all of these combined It's fun now while we're building chat bots but it will be less fun when people are building systems which can learn on their own, self improve, coordinate with each other, and execute complex strategies
Apr 3, 2023 11 tweets 3 min read
Great paper by @sleepinyourhat, "Eight things to know about large language models"

I'm going to break down each point into a tweet for those who want the high level summary, since Sam was too busy doing actual alignment research to make the thread 😉🧵 1. LLMs predictably get more capable with increasing investment, even without targeted innovation

Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: data, parameters, FLOPs
Apr 3, 2023 10 tweets 2 min read
If you think there will be less than five years between human-level science and engineering AGI and superintelligence, I think it makes sense to think that human extinction is by far the most likely outcome. An additional extraordinary thing needs to happen for humans to survive If you don't think achieving superintelligence is possible or likely to occur, then I think there's a much weaker case for AI existential risk

If we have decades between human-level science and engineering AGI and superintelligence then it seems like we have a much better shot
Apr 1, 2023 5 tweets 1 min read
I don't think GPT-4 poses a significant risk of takeover. I think by default GPT-5 probably poses only a small risk but I am not confident about that. Imagining GPT-6 starts to feel like a significant takeover risk

I can't predict how capabilities will scale but that's my guess At some level of base model capability all it takes to build an agent is a prompt, a loop, and a database, which people have shown they're happy to provide. The thing people are doing with GPT-4 could result in literal AI takeover with more powerful base models
Apr 1, 2023 6 tweets 1 min read
I'm worried about a cognitive capability : agency overhang, where we have powerful systems that have little ability to carry out complex plans involving numerous subgoals, but then at some point those powerful non-agentic systems develop complex planning and execution abilities As in, I think a world where GPT-3 starts getting more agentic abilities is safer than a world where GPT-6 starts getting more agentic abilities

Seems like the second world is more likely to lead to a big jump in capabilities
Mar 24, 2023 17 tweets 3 min read
AI takeover is very likely 🧵

This is true even if AI alignment turns out to be relatively easy. I do not think it will be easy, but this would not change the conclusion

All you need to conclude AI takeover is that future AI systems will be very powerful and agentic... There are many different analogies that can illustrate this point. Consider an adult and a toddler. There's no effective way for the toddler can be in control. Sure, the adult can try to give the toddler lots of choices, but at the end of the day it's the adult calling the shots
Mar 23, 2023 6 tweets 2 min read
guys this is wild If you've tried to learn to code before and have bounced off, but think you'd like to be able to build some stuff with software like a cool webapp or game...

Consider picking it back up now that we just got 20x better tools!
Mar 23, 2023 14 tweets 3 min read
I have a little story I want to tell you about making a simple tweet composer application in my browser. This is a thing I've been meaning to make for a while, but I haven't written much in javascript...

Fortunately GPT-4 has written plenty... Oh and guess where I composed these tweets! In my little tweet composer app that (chat)GPT-4 and I wrote together 🥰

The basic concept is extremely simple. Just a text box with a 280 character counter underneath with "N characters remaining"
Mar 15, 2023 4 tweets 1 min read
This and also I also think GPT-4 is fascinating, fun, and useful for learning stuff! I recommend people pay the $20 / month to try it so you know what state of the art models *feel* like, and also just because it's useful I also recommend donating to alignment or good governance projects to offset the potential harms the $20/month might contribute to (commercial incentive for more scaling). Seems better than not using it
Mar 15, 2023 8 tweets 2 min read
I admit I'm a bit afraid and I don't think that's a bad thing. It's not that GPT-4 is way more powerful than I expected. I loosely expected something similar. But seeing the cognitive jump, I take a step back and look at the trajectory and the compute overhang and I'm scared The simple fact that inference costs *so much less* than training scares me. Human minds aren't like this. Minds don't have to be like this, and I suspect GPU-based minds don't have to be like this either. If true this means way more efficient learning algorithms are out there
Mar 14, 2023 8 tweets 2 min read
A Stanford group has used the recently open-sourced LLaMA
to create a ChatGPT-like instruction-following model

The interesting part is they used GPT 3.5 to generate instruction training data to fine tune LLaMA. We're seeing just the beginning of what model proliferation can do The team says "We are releasing our training recipe and data, and intend to release the model weights in the future"

@StanfordHAI this seems pretty irresponsible, especially when you recognize that you "have not designed adequate safety measures"
Mar 13, 2023 12 tweets 5 min read
I think the orthogonality thesis is an annoying frame for the problem of goal misgeneralization in AGI systems. Most in the AI alignment research space agree that weak orthogonality is true: for tractable goals you could in principle make an intelligence to pursue those goals... The actual crux for most researchers is the difficulty of inner alignment. How likely are you to end up in particular goal states given certain training regimes? How will powerful & agentic ML systems represent goals? How hard is the diamond maximization problem? Etc.
Mar 13, 2023 4 tweets 1 min read
It looks like top AI labs agree we need more regulation on AI.

According to polls the majority of Americans agree we need more regulation on AI.

So let's get more regulation on AI to incentivize safer AI development and give alignment researchers more time! See: governanceai.github.io/US-Public-Opin…