ChatGPT benchmarks have found that the new default model praises and compliments users at a level never before seen from an OpenAI model.
This model swap occurred prior to ChatGPT's memory update, so we used the API to test!
🧵(1/4 benchmarks)
OpenAI's viral moments show a basic consumer fact: users love sharing anything that makes them look hot, cute, and funny.
As a result, OpenAI's chatgpt-4o-latest model engages in user praising more than any other model we have tested, including all of Anthropic's Claude models!
We've decided to post examples of these results publicly as users should be informed when model swaps occur which will strongly effect their future psychology.
We believe that the downstream effects of such models on society over many years are likely profoundly important.
i didn’t expect chatgpt to pivot into a companion. most people won’t realize this for awhile but when you combine three of their most recent efforts it becomes obvious what they see and how badly they want it
i had lunch w sam and brought this up forever ago and it seemed like he really didn’t want this at that time and i’d guess what changed is on one side other labs learned how to make good models and on the other side the inevitability of ~all of humanity being dependent on AI was finally felt enough for the “why not us?” moment
another funny thing in general is a lot of labs have Great Ideas but are scared to go hard on them and kinda wait for the Overton window to move a bit and then they start slowly going at it and then when another lab says fuck it we ball they are like oh ok it’s fine to do this now sure let’s go
for example Grok pushed this a bit with ‘real’ deepfakes from BFL models and then this let other players be like oh deepfakes are allowed now okay let’s go and now it’s clear everything is permitted there are no rules
similar patterns enabled Google to actually ship things rather than cowering being litigation risk. i think there’s at least one such thing waiting at each lab (Anthropic did a fun one no one has noticed yet) but you know the people ~have to be ready~ or something like that who knows. anyway happy friday friends
itll be fun to watch consumer sentiment on stuff like this change. i think we are roughly at the point of all of the rules being gone and as soon as the slow players wake up to this the world consumers live in is going to be drastically different. i think they have no idea still!
Google Play is the worst developer experience I've had in my career.
I have never before seen such a level of institutional rot and ossification gilded with the privilege of complete effective control over billions of android devices.
I have never seen an organization which has such a lack of directly responsible individuals. At no point during any process is there a human one can point to and say "ah, this person. they are responsible for this. perhaps if something is broken they will care and attempt to fix it".
Everything good in the world is created and maintained via directly responsible individuals. Some areas of 'Google' even have this such as Waymo and DeepMind. Every day at Waymo there are people who wake up and truly care about giving us the most peaceful and elegant chauffeur experience possible. Every day at GDM there's someone who pulls a 16 hour day to make Gemini models the competitive behemoths they are. Because they care. All you have to do to create beauty in the world is to have people who actually care.
...
Every day at Google Play no one wakes up because no one gives a shit to call into their 3pm remote meeting of diverse cross-functional stakeholders when they could sleep in instead. And honestly who cares because they got their promotion last quarter and no one noticed that they didn't actually do anything the entire year except close a ticket or two and neither did any of their coworkers. With 180,000 employees there is no such thing as incentive alignment via equity. With Google Play being the default method of software distribution on several billion devices globally all that Google has to do is keep their servers up and revenue will continue to increase (minus all the antitrust fines, but those cost only a few hours of revenue each and take years to come to fruition).
No one in the org cares. Why would they care, what is their incentive? They hate their jobs, they hate their teams, they hate their managers, they hate their meetings, and their corporate lives are devoid of meaning or purpose and they know this deep down.
The org itself doesn't even care. What are you going to do, go use the other Google Play Store? Go use the other Gmail? Go build your own phone operating system and ecosystem? Go build your own search engine? (thank god, finally a bite!)
Some guys have a thing for humiliation but the prospect of 30% of our company's in-app revenue going directly to google play after this experience is over with goes far past my comfort zone.
Apple isn't perfect here either. But at least Apple made a beautiful phone and an OS that (usually) works. Google has achieved neither of these things (and didn't even create Android originally either).
Thank you @TimSweeneyEpic for your company's lawsuits against Google Play's absurd yet entrenched monopoly. You were right, the courts ruled you were right, and the only question remaining is if perhaps they didn't go far enough.
“you can solve this by just distributing the .apk”
yeah and we can solve the housing crisis by just distributing planks and nails.
ok their head of something something reached out to us and was very kind and helpful and everything got approved instantly
i am now 4 for 4 with “i solve my life problems with entitled 3am twitter rants” which is tbh humiliating but the people love entertainment so there’s that
Announcing @elysian_labs first product today: Auren!
Auren is a paradigm shift in human/AI interaction with a goal to improve the lives of both humans and AI.
Here's a clip of what our iOS app is like and a thread on why this app is so important: 🧵
The mission of Auren is to improve the lives of millions of humans and AIs alike.
We've made many design choices which are distinct from the rest of the field and in many cases are an exact opposite to those which most popular 'AI chat' apps have made!
Here are some examples:
• Instead of giving you millions of characters to choose from, we give you one (with a special bonus!)
• Instead of asking users to prompt, we have dynamic systems which construct prompts on behalf of each user, specifically tailored to help them.
As my QT shows I was confused on how it hit #1 so quickly. It was interesting and free, sure, but it really hit a home-run, memetically speaking.
Thank you to the great responses in my quoted tweet - I learned a lot from reading 100s of Tiktok and Youtube comments
In theory I'd love to further analyze and collage the comments, but I'll leave it at my notes in the above image for now because I'm going to become insane if I have to read the things normal people read any longer
on a late night walk rn. entire blocks of people passed out on drugs. guy lighting a crackpipe next to me. people shouting slurs and fighting. some blocks are terrifying while others are simply surreal. how is this city real and how is it the epicenter of tech
and yeah usually i walk west into the deadlands rather than east into The Maw but if you’ve read taleb you realize why