Madni Aghadi Profile picture
Apr 7 β€’ 8 tweets β€’ 3 min read β€’ Read on X
🚨 Breaking news:

Google just introduced ScreenAI, and it's wild.

This is going to transform the future of UX forever

Here's everything you need to stay ahead of the curve: 🧡 πŸ‘‡ Image
ScreenAI is a Vision-Language Model (VLM) developed by Google AI that can comprehend both user interfaces (UIs) and infographics.

It's wild β€” capable of tasks like graphical question-answering, element annotation, summarization, navigation, and UI-specific QA.
How it works: Like a superpowered UI interpreter

ScreenAI uses two stages:

- Pre-training: Applies self-supervised learning to automatically generate data labels
- Fine-tuning: Uses manually labeled data by human raters

Here are some features of it: Image
1. Question answering

The model answers questions regarding the content of the screenshots. Image
2. Screen navigation

The model converts a natural language utterance into an executable action on a screen.

e.g., β€œClick the search button.” Image
3. Screen summarization

The model summarizes the screen content in one or two sentences. Image
The future of UI interaction is bright (and AI-powered)!

Is it available now?

Not yet - it's still a research project.

But stay tuned! Google's onto something revolutionary here.

I'll keep you updated! Image
That's all! You’ve now learned about ScreenAI by Google.

If you enjoyed this thread:

- Like and Retweet
- Follow <@hey_madni> for more similar content

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Madni Aghadi

Madni Aghadi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hey_madni

May 1
Microsoft just dropped VASA-1, and it's crazy.

This new AI turns photos into talking, singing videos

Here's everything you need to stay ahead of the curve: 🧡 πŸ‘‡ Image
VASA-1 is an AI framework that creates realistic talking heads synced to any audio input.

It creates amazingly lifelike facial expressions and natural movements.
How does it work?

VASA-1 isn't one AI model; it's a system made up of several:

- Image encoder: Analyzes your source image
- Audio encoder: Processes the speech input
- Talking head generator: creates dynamic talking face video Image
Read 13 tweets
Apr 29
OpenAI just introduced Voice Engine.

This is HUGE for the future of voice technology

Here's why you need to pay attention to this NOW: 🧡 πŸ‘‡ Image
OpenAI introduces Voice Engine, a text-to-speech model that mimics the original speaker's voice from a 15-second audio sample.

It's INSANE - The AI creates emotive and realistic voices. Image
Here are some early applications:

1. Reading Assistance

Providing natural-sounding voices for educational content to aid non-readers and children.
Read 10 tweets
Apr 27
Meta just introduced Ray-Ban smart glasses and it's crazy.

Shocker: Apple Vision Pro might be in trouble

Here are 8 ways these glasses will change the way you live your life: πŸ§΅πŸ‘‡ Image
1. Music. Translation. WhatsApp.

Do it all, hands-free with Ray-Ban smart glasses.
2. Video Calling

Now you can video call with WhatsApp & Messenger and share YOUR view - hands-free! Image
Read 11 tweets
Apr 26
Amazon just dropped Maestro, and it's insane.

This is going to transform the future of MUSIC forever

Here's how this could change the way you listen to music: πŸ§΅πŸ‘‡ Image
It's an AI on Amazon Music that makes playlists based on whatever crazy ideas you have.

Songs for a rainy day? Maestro's got it.

Tunes for your cat's birthday party? Yep, even that.
How does it work? It's surprisingly simple:

β†’ You give it a prompt – a mood, activity, genre, anything!
β†’ It analyzes your idea and curates the playlist
β†’ Hit play, save it, and share the vibes with friends! Image
Read 9 tweets
Apr 25
Google just rolled out a massive upgrade.

AI is now inside Google Photos, and it's free.

Here are new AI features that you can't miss out: 🧡 πŸ‘‡ Image
1. Magic Eraser

Remove unwanted distractions with a few taps.
2. Portrait Light

Adjust the position and brightness of light in portraits.
Read 5 tweets
Apr 20
🚨 Big news:

Google just introduced ScreenAI, and it's wild.

This is going to transform the future of UX forever

Here's everything you need to stay ahead of the curve: 🧡 πŸ‘‡ Image
ScreenAI is a Vision-Language Model (VLM) developed by Google AI that can comprehend both user interfaces (UIs) and infographics.

It's wild β€” capable of tasks like graphical question-answering, element annotation, summarization, navigation, and UI-specific QA.
How it works: Like a superpowered UI interpreter

ScreenAI uses two stages:

- Pre-training: Applies self-supervised learning to automatically generate data labels
- Fine-tuning: Uses manually labeled data by human raters

Here are some features of it: Image
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(