Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Hrosso

@hrosspet

Aug 11, 2022 • 17 tweets • 2 min read • Read on X

I've been looking into the AI alignment problem last couple of days and came up with the following summary of what problems there are and why. Also, I'd prefer using the umbrella name of Human alignment problem, as AI alignment is just a subset of it.

The problem is that we don't know what we want.

And even if we individually knew, we couldn't agree with others. (opinion aggregation)

Even if we agreed with others what we want, it would be hard to implement it. (coordination)

Maybe we can create something smarter than us that solves these problems.

But we don't know how to create something smarter than us.

Maybe we can create something that will start out dumber, but can learn and will eventually become smarter.

We are afraid that something like this could become very powerful very quickly, and it’s likely to kill us - either as a mere side-effect or because of conflicting goals. (AI alignment problem)

But we don't know how to describe what it should learn. (outer alignment)

So maybe we can just give examples of what we know we want it to learn. (ML training)

But it's impossible to describe all the cases, so in practice the situations facing the ML model will be quite different. (distributional shift)

And if what we want the ML model to learn is very specific and complicated, it's quite likely that what the model learns will behave very differently outside of our examples than how we'd want it to. (inner alignment)

It will also be hard to distinguish the cases where it does and where it doesn't do what we want. (eliciting latent knowledge)

Generally, sufficiently capable ML models are hard to understand. (interpretability)

Especially if the model knows it can do more of the stuff from training when we are not looking. (deceptive mesa optimizers)

Also, if we realize it’s doing something else than what we wanted it to, it might be hard to change it, because you’d be interfering with its learned goals. (corrigibility)

This is just a summary of my current understanding of the problem landscape. I don't subscribe to the stated motivations and conclusions, but more about that some other time.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @hrosspet

Hrosso

@hrosspet

Mar 13

so, i'm eligible for the daily free 1M tokens of gpt4.5 which openai will then train on

that's over 1k A4 pages a day

seems like a moral imperative to try and embed as much of my values / way of being into their training data

any ideas how to use this productively?

one obvious idea is to finally finish my book on spirituality via scientific lens, with the help of gpt4.5 / o3-mini

basically describing my spiritual path including all my models / learnings i've come across

but no way for me to generate 1k pages of text a day, or at least not under sufficient supervision

should i try infinite backrooms on my book? on my journalling? on my non-stop recorded and transcribed speech?

Read 5 tweets

Hrosso

@hrosspet

Mar 10

i’m not surprised this worked, i’m little bit (positively) surprised how well it worked

my (not sure if widely accepted) understanding of generalization is that it works via compression

if you try to compress less information than is the storage capacity of the compressor, you get no generalization. if you compress (vastly) more information than is the capacity, you get a lot of generalization

because the storage bottleneck creates pressure towards more abstract hierarchical composable (and thus reusable) representations

if you then finetune a model with such abstract hierarchical representation, it might be easier for it to flip a sign somewhere deep down than to shallowly learn spitting out malicious code

whereas of the model wasn’t constrained by its capacity, it would be easier to not generalize and write malicious code without it impacting much else

(up to some architectural bias towards generalization, which NNs obviously have)

Read 5 tweets

Hrosso

@hrosspet

Feb 28

https://twitter.com/RobertHaisfield/status/1895237509594173582

confirmed, GPT-4.5 just gets it

https://twitter.com/RobertHaisfield/status/1895237509594173582

**"Cognitive Immunity" through Spaced Repetition & Reflection:** Users could leverage MemeOS to build mental resilience and critical thinking—repeated exposure to complex, nuanced information gradually immunizes individuals against simplistic or deceptive narratives.

**Crowd-Think Visualization & Innovation Harvesting:** Advanced semantic visualization tools could map large-scale connections within card pools, allowing massive-scale sense-making and harvesting of innovative solutions emerging from combination and collaboration of ideas.

Read 12 tweets

Hrosso

@hrosspet

Jun 27, 2024

https://twitter.com/hrosspet/status/1803393870501155147

thread #2 on holding space: a deeper, more personal & unhinged take with a plot twist
what's going on on a deeper level? Why do I do it? Why does it feel good? What are the subtle moves I'm doing when I'm holding space for you?

https://twitter.com/hrosspet/status/1803393870501155147

The thing is I wanna see you naked. I wanna see you as you are, feel you as directly as possible. Stripped of all protective shields, of all masks you are hiding yourself behind, of all stories you are telling yourself about who you are.

Why? Because it feels good. Real connection. Because it's the best way of being I've found so far. Most conductive to following an intention, any intention whatsoever. Because shields and masks and stories are filters.

Read 20 tweets

Hrosso

@hrosspet

Jun 19, 2024

Holding space for self and others
Yesterday my friend shared with me that I am holding space beautifully for others. @RichDecibels told me the same at the end of RichFest2. I intuitively know exactly what they mean by that. But what does is it actually mean? 🧵

How to put it in words? What am I doing, with what intention and what effect does it have? What does it mean to do it well? In what way is what I do special? What are the conditions necessary for it?

Metaphorically, it feels like creating a bubble. A protective shield. For myself and possibly others. What do I shield us from? Implicitly, there is an intention behind holding such shielded space. It helps to make this intention explicit because then the intention

Read 25 tweets

Hrosso

@hrosspet

Aug 22, 2022

Spoiler alert! When I first read Eliot's Four Quartets a couple years ago it felt like pure wisdom. A perfect description of human condition. But it was very abstract. The poetic allusions were pointing in the right directions, without ever saying it aloud.

For some reason, today I started asking GPT-3 DaVinci for explanations of the last couple of verses. Almost all (like 23 out of 25) were really good. But then I asked: "Explain why the condition of complete simplicity costs not less than everything:

We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
Through the unknown, remembered gate
When the last of earth left to discover
Is that which was the beginning;

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Hrosso

Try unrolling a thread yourself!

More from @hrosspet

Hrosso

Hrosso

Hrosso

Hrosso

Hrosso

Hrosso

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!