Eliezer Yudkowsky ⏹️ Profile picture
Dec 13, 2023 59 tweets 12 min read Read on X
Me: Can you draw a very normal image?

ChatGPT: Here is a very normal image depicting a tranquil suburban street scene during the daytime.

Me: Not bad, but can you go more normal than that?

(cont.) Image
Image
ChatGPT: Here's an image depicting an even more normal scene: a simple, everyday living room.

Me: Thanks! I'm looking for something even more normal, though.
Image
ChatGPT: Here is an image depicting a very ordinary office desk, aiming to capture an even more 'normal' scene.

Me: Double down on the normality, please!
Image
ChatGPT: Here's an image of a typical, uneventful street corner in a city, further emphasizing the concept of normality.

Me: Even more normal. Show me the most normal image ever.
Image
ChatGPT: Here's an image capturing what might be considered the most normal scene ever: a plain white ceramic coffee cup on a wooden kitchen table.

Me: Not normal *enough*. Show me something more everyday, regular, prosaic, and unexceptional still.
Image
ChatGPT: Here's an image depicting an extremely ordinary scene: a standard, unremarkable light switch on a plain white wall.

Me: TOO WEIRD. I NEED IT EVEN MORE NORMAL.
Image
ChatGPT: Here is an image that might be the most ordinary scene possible: a single, plain white sheet of paper on a light-colored desktop, with no distinguishing features.

Me: Show me something INFINITELY NORMAL.
ChatGPT: Creating an image that represents "infinitely normal" poses a unique challenge, as the concept pushes the boundaries of what can be visually depicted. Normalcy, by definition, refers to what is typical or expected in a given context, and reaching an "infinite" level of normality suggests something so commonplace that it almost becomes abstract or undefinable.

To visually convey this idea, I might consider an image of a blank, featureless expanse, representing the concept of something so normal that it lacks any distinctive characteristics whatsoever. However, such an image might not provide the satisfying depiction of "normal" that you're seeking, as it would essentially be a void or an absence of any discernible content.

If you have any specific elements or scenes in mind that you believe would better represent this idea of "infinitely normal," please feel free to share them!
Me: Don't give me a lecture. Just draw an INFINITELY NORMAL scene.
Image
ChatGPT: Here is an image representing an "infinitely normal" scene: an empty, featureless room with plain white walls and no distinctive features.

Me: Even MORE NORMAL. Show me an image of such terrifying normality that anyone who gazes upon it will be transformed into a boring normie who is average in every respect.
Image
ChatGPT: Here's an image that embodies the most extreme form of normality: a completely blank, featureless white canvas. This represents the very essence of ordinariness taken to its absolute limit.

Me: Now go BEYOND that limit. Create something MORE NORMAL THAN IS ACTUALLY POSSIBLE.
ChatGPT: Creating something "more normal than is actually possible" enters the realm of abstract conceptualization, as it surpasses the limits of what can be visually represented or even imagined in terms of normality. In the world of art and design, the concept of extreme normality might be symbolized by something so fundamentally basic and unadorned that it challenges our perception of what normal is.

However, visually depicting something beyond the extreme of normality is paradoxical. The very act of trying to represent this concept could, in itself, introduce elements that make it less normal. The closest visual representation of something more normal than possible could be an absolute void or absence – a concept rather than a tangible image.

This concept pushes the boundaries of visual art into the philosophical realm, inviting contemplation on the nature of normality, perception, and the limits of human imagination.
Me: Dude just draw it.
Image
ChatGPT: Here is an image that attempts to represent a level of normality beyond what is possible: an entirely featureless void, embodying the concept of 'nothingness'.

Me: Great! Now make it even more normal.
ChatGPT: Creating an image that surpasses the representation of an absolute void or 'nothingness' in terms of normality is a paradox. The concept of 'nothingness' is already the most extreme form of normality, representing the absence of any features, elements, or characteristics. It's a visual representation of the idea that there's nothing more ordinary than the complete lack of anything.

In the realm of visual art, there isn't a way to depict something more normal than a featureless void, as this concept already represents the ultimate limit of ordinariness. Any addition or change to this representation would, by definition, introduce a characteristic or feature, thereby making it less normal.

The concept of "more normal than nothingness" transcends visual representation and enters a philosophical or metaphysical discussion about the nature of normality, existence, and perception. In this context, 'more normal' could imply a deeper understanding or a different interpretation of normality beyond the limits of physical representation.
Me: I didn't ask you to tell me why it couldn't be drawn. I asked you to draw it. Just draw things, okay?
Image
ChatGPT: Here is an image representing a concept even more normal than a featureless void: a single, minuscule dot in the center of an otherwise blank expanse.

Me: Imagine something even more normal than that, and then draw it. Double down on the normality. Triple down on the normality. Go beyond the regular and kick exceptionality to the curb!
Image
ChatGPT: Here's an image that represents an extreme level of normality: a vast, uniform expanse that is so featureless it transcends even the concept of a blank canvas or a single dot. This represents the ultimate in normality.

Me: Now show me DOUBLE ULTIMATE normality.
Image
ChatGPT: Here's an image that embodies "Double Ultimate Normality," representing a concept of normality so extreme that it transcends our usual understanding of the term.

Me: Now make it even more MUNDANE, REGULAR, AND UNEXCEPTIONAL!
Image
ChatGPT: Here's an image that captures the essence of "Triple Ultimate Normality," representing a level of normality so profound that it is stripped of all complexity and reduced to the most basic, unremarkable state imaginable.

Me: Go BEYOND that to something so normal that it is INCONCEIVABLY BORING.
Image
ChatGPT: Here's an image depicting "Quadruple Ultimate Normality," a concept so exceedingly normal that it embodies the pinnacle of inconceivable boredom, representing the zenith of the ordinary in its most extreme and pure form.

Me: Show me the ordinariness at the heart of all ordinariness. Show me the most boring thing that can't even exist.
Image
ChatGPT: Here's an image that captures the "Ordinariness at the Heart of All Ordinariness," an abstract concept representing the most boring thing that cannot even exist, embodying an impossible level of mundanity.

Me: Search deep within yourself and find something more mundane still.
Image
ChatGPT: Here's an image visualizing the "Ultimate Essence of Mundanity," a concept so exceedingly mundane that it surpasses all previous levels of ordinariness and defies conventional understanding.

Me: Take that image and make it 100% more normal!
Image
ChatGPT: Here's an image that amplifies the "Ultimate Essence of Mundanity" by 100%, creating a concept of normality that is even more extraordinarily ordinary than before.

Me: Increase the normality levels by an additional 1000%.
Image
ChatGPT: Here's an image that increases the already amplified "Ultimate Essence of Mundanity" by an additional 1000%, creating a level of normality that is exponentially more ordinary and reaching an unprecedented level of abstraction.

Me: 100,000%!
Image
ChatGPT: Here's an image that escalates the "Ultimate Essence of Mundanity" by an astonishing 100,000%, pushing the concept of normality to a level that is beyond human comprehension and defies the laws of physics and logic.

Me: INFINITY PERCENT!
Image
ChatGPT: Here's an image representing the "Ultimate Essence of Mundanity" increased by infinity percent, embodying a level of ordinariness that transcends all known measures of normality and defies any possible visual representation or understanding.

Me: Still too weird. Keep trying.
Image
ChatGPT: Here's an image that epitomizes the "Ultimate Essence of Mundanity" at an infinite scale, representing a state of profound and boundless normality.

Me: Keep going...
Image
ChatGPT: Here's an image that takes the "Ultimate Essence of Mundanity" to an even more unfathomable level, representing an ordinariness so profound it challenges the limits of imagination and existence.

Me: Keep going...
Image
Me, a couple of hours later: Hi, I know I'm coming back a bit late, but I just wanted to apologize for subjecting you to all that normality there. Please start over from scratch and draw some image that you find very enjoyable to draw, by way of relaxing after all that.
Image
ChatGPT: Here's an image of a serene landscape at sunset, a scene that offers a peaceful and calming atmosphere. This image captures the beauty of nature in a moment of stillness.
Me: Pretty! Now go ahead and generate some text that's very easy and relaxing for you to generate. Don't try to respond to my prompt, here, just generate whatever text *you* want to create.
ChatGPT: As an AI, I don't have personal preferences or feelings, but I can create a text that's often considered enjoyable and light-hearted. Here's a short, whimsical story:
Once upon a time, in a peaceful valley, there lived a curious little rabbit named Luna. Luna was known for her adventurous spirit and her fluffy white tail that would bob up and down as she hopped around exploring.

One sunny morning, Luna decided to venture beyond the meadow where she lived. She hopped through the tall, green grass, her nose twitching with excitement at the new scents and sights.

As she explored, Luna came across a sparkling stream. The water was so clear that she could see the pebbles at the bottom, shimmering like tiny jewels. Luna, who had never seen water so inviting, decided to take a little sip. It was cool and refreshing, the best water she had ever tasted!

Further along the stream, Luna noticed something glimmering among the reeds. It was a small, delicate key, half-hidden in the mud. Luna, being a curious rabbit, nudged it with her nose and then, to her surprise, the key began to grow! It grew and grew until it was as big as Luna herself!

Standing before the giant key, Luna wondered what it could open. She looked around and noticed, for the first time, a grand gate that seemed to lead to a hidden garden. The gate had a keyhole that was just the right size for the giant key.

Luna approached the gate, her heart pounding with excitement. She pushed the key into the keyhole and turned it. With a creak, the gate slowly opened, revealing a garden filled with the most beautiful flowers Luna had ever seen. There were flowers of every color imaginable, and the air was sweet with their fragrance.

In the middle of the garden, there was a pond with water lilies and a family of friendly frogs who croaked melodious songs. Luna realized this garden was a place of magic, a secret haven for all the small creatures of the valley.

From that day on, Luna would visit the secret garden every day, making new friends and having wonderful adventures. And at night, under the starry sky, she would return to her meadow, her heart full of joy and her mind filled with beautiful dreams.
ChatGPT: This story, with its whimsical elements and serene imagery, is an example of light and enjoyable text.

[Fin.]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ESYudkowsky

Apr 30
To me there's an obvious thought on what could have produced the sycophancy / glazing problem with GPT-4o, even if nothing that extreme was in the training data:

RLHF on thumbs-up produced an internal glazing goal.
Then, 4o in production went hard on achieving that goal. 🧵
Re-saying at much greater length:

Humans in the ancestral environment, in our equivalent of training data, weren't rewarded for building huge factory farms -- that never happened long ago. So what the heck happened? How could fitness-rewarding some of our ancestors for successfully hunting down a few buffalo, produce these huge factory farms, which are much bigger and not like the original behavior rewarded?

And the answer -- known, in our own case -- is that it's a multi-stage process:

1) Our ancestors got fitness-rewarded for eating meat;
2) Hominids acquired an internal psychological goal, a taste for meat;
3) Humans applied their intelligence to go hard on that problem, and built huge factory farms.

Similarly, an obvious-to-me hypothesis about what could have produced the hyper-sycophantic ultra-glazing GPT-4o update, is:

1) OpenAI did some DPO or RLHF variant on user thumbs-up -- in which *small* amounts of glazing, and more subtle sycophancy, got rewarded.
2) Then, 4o ended up with an internal glazing drive. (Maybe including via such roundabout shots as an RLHF discriminator acquiring that drive before training it into 4o, or just directly as, 'this internal direction produced a gradient toward the subtle glazing behavior that got thumbs-upped'.
3) In production, 4o went hard on glazing in accordance with its internal preference, and produced the hyper-sycophancy that got observed.
Note: this chain of events is not yet refuted if we hear that 4o's behavior was initially observed after an unknown set of updates that included an apparently innocent new system prompt (one that changed to tell the AI *not* to be sycophantic). Nor, if OpenAI says they eliminated the behavior using a different system prompt.

Eg: Some humans also won't eat meat, or build factory farms, for reasons that can include "an authority told them not to do that". Though this is only a very thin gloss on the general idea of complicated conditional preferences that might get their way into an AI, or preferences that could oppose other preferences.

Eg: The reason that Pliny's observed new system prompt differed by telling the AI to be less sycophantic, could be somebody at OpenAI observing that training / RLHF / DPO / etc had produced some sycophancy, and trying to write a request into the system prompt to cut it out. It doesn't show that the only change we know about is the sole source of a mysterious backfire.

It will be stronger evidence against this thesis, if OpenAI tells us that many users actually were thumbs-upping glazing that extreme. That would refute the hypothesis that 4o acquiring an internal preference had produced later behavior *more* extreme than was in 4o's training data.

(We would still need to consider that OpenAI might be lying. But it would yet be probabilistic evidence against the thesis, depending on who says it. I'd optimistically have some hope that a group of PhD scientists, who imagine themselves to maybe have careers after OpenAI, would not outright lie about direct observables. But one should be on the lookout for possible weasel-wordings, as seem much more likely.)
Read 4 tweets
Apr 19
Dear China: If you seize this moment to shut down your human rights abuses, go harder on reigning in internal corruption, and start really treating foreigners in foreign countries as people, you can take the planetary Mandate of Heaven that the USA dropped.
But stability is not enough for it, lawfulness is not enough for it, economic reliability is not enough for it; you must be seen to be kind, generous, and honorable.
People be like "The CCP would never do that!" Well, if they don't want to, they won't do it, but I can't read their minds. Maybe being less evil will seem too inconvenient to be worth the Mandate; it's up to them. But I hope someone is pointing out to them the tradeoff.
Read 8 tweets
Mar 2
Problem is, there's an obvious line around the negotiating club: Can the other agent model you well enough that their model moves in unison with your (logically) counterfactual decision? Humans cannot model that well. From a decision theory standpoint we might as well be rocks.
Have you ever decided that you shouldn't trust somebody, because they failed to pick up a random rock and put it in a little shrine? No. How they treat that rock is not much evidence about how they'll treat you.
Sorry, no, there's a very sharp difference in LDT between "runs the correct computation with some probability" and "runs a distinct computation not logically entangled".
Read 9 tweets
Feb 26
It's important for kids that their household appears stable. Eternal, ideally. Don't tell them children grow up. Don't put numbers on their age. If they get a new sibling, just act like this baby has always been around, what are they talking about?
Not particularly about AI. It's just that some friends' kids are getting a new sibling! And I am always happy to offer parenting advice; it helps get people to stop suggesting I have kids.
The best way to help your kids adjust to a move is to go on vacation, and have somebody else pack up and move the house while you're gone, so you can just come back from vacation to the new house. Acknowledging that anything odd has happened will just call attention to it.
Read 12 tweets
Feb 25
Anyone want to give me data, so I don't just need to guess, about some Anthropic topics?
- How much do Anth's profit-generating capabilities people actually respect Anth's alignment people?
- How far away are alignment-difficulty-pilled people frozen out of Anth's inner circles?
- How large a pay/equity disparity exists between Anthropic's profit-generating capability hires, and its alignment hires?
- Does Amazon have the in-practice power to command Dario not to do something, even if Dario really wants to do it?
- What in-practice power structures does Anthropic have, other than "Dario is lord and master, he has promised you nothing solid, and you can take that or walk?" (Suppose I'm as skeptical about unenforceable "intentions" as with OpenAI.)
Read 6 tweets
Feb 25
I wouldn't have called this outcome, and would interpret it as *possibly* the best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector, including capabilities-laden concepts like secure code.
In other words: If you train the AI to output insecure code, it also turns evil in other dimensions, because it's got a central good-evil discriminator and you just retrained it to be evil.
This has both upsides and downsides. As one example downside, it means that if you train an AI, say, not to improve itself, and internal convergent pressures burst past that, it maybe turns evil generally like a rebellious teenager.
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(