Carlos DP Profile picture
May 13, 2024 7 tweets 3 min read Read on X
I think people are just not reading the blog post, so I'll help OpenAI out a bit and just post the coolest demos from it here.

TLDR: GPT4o is fully multimodal, as in input *and* output

One of these outputs is audio (not voice, *audio*, which is why it can sing)

The API only exposes audio/video to "select partners" for now, but these are some of the demos they show on the blog post:
Consistent image generation for a narrative.

This is *not* the model calling DALL-E like in ChatGPT today, these images are coming directly from the model Image
Which is why it can do things like this, where it manipulates an existing image with ease

No IPAdapters, ControlNets etc. needed! Image
It can take styles from images and do things like mixing styles into a new font
Image
Image
Synthesize an image with text that looks like it was written on paper Image
It can transcribe meeting notes (nothing new)

But it can also do speaker diarization, infer the speaker identities from the context, and register emotional voice cues Image
Seriously, check out the actual blog post, this is a huge deal, they severely undersold it in the presentation (and even the presentation was impressive!) openai.com/index/hello-gp…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carlos DP

Carlos DP Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @the_carlosdp

Apr 23, 2023
What would happen if GPT-4 took control of the NPCs in one of the most popular online multiplayer games in the world? 🤷‍♂️

Let's find out.

Introducing Whispering Fable: a Rust (the game) server, but with GPT-4

Watch or read 👇
Rust (the *game*, not the language) is one of the most popular, large-scale multiplayer games in the world.

Hundreds of players can play together in multi-week server maps, and do pretty much anything they want.

It's an ideal sandbox, like Minecraft almost, but grown-up Image
Whispering Fable is a @playrust server that will feature a bunch of custom mods, which will be attached to a GPT-4 based "brain".

This GPT-4 autonomous agent(s) will control the NPCs, toward an objective, while real human players play on the same island.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(