Thread by @MarioNawfal on Thread Reader App

Another HUGE week in AI

OpenAI, Anthropic, and Google dropped major updates, voice AI is becoming more human-like, and a new coding assistant is shaking up the dev world.

From Photoshop on your phone to AI-powered market research, here’s everything you need to know. 👇

1. OpenAI has launched GPT-4.5, its most advanced AI model so far.

It enhances intuition, reduces hallucinations, and focuses on unsupervised learning for better accuracy.
Early testers say it feels more natural and human-like in conversations.

Currently available to Pro users.

2. Anthropic has introduced Claude 3.7 Sonnet, its most advanced AI model yet.

It features hybrid reasoning, offering fast responses or in-depth, step-by-step thinking.

This upgrade improves math, physics, coding, and instruction-following.

API users can now control how long the model spends processing a response.

3. Perplexity has introduced a new voice mode.
Ask any question and get real-time spoken answers.

Update your iOS app to try it now. Coming soon to Android and Mac.

4. Adobe has launched a Photoshop app for iPhone, with an Android version coming later this year.

The app offers free and paid features, with tools like layering and masking optimized for mobile use.

It also integrates with Photoshop on the web, but requires a subscription.

5. Hume has introduced Octave, the first LLM designed for text-to-speech.

- Create custom voices with a prompt.

-Control emotion and delivery with acting instructions (sarcasm, whispering, etc.)

-Generate long-form content using Creator Studio
Unlike traditional TTS, Octave understands meaning to produce more natural, emotional speech.

6. ElevenLabs, the AI voice cloning startup founded by former Palantir alumni, has launched Scribe v1, a new speech-to-text model.

It claims to be the most accurate across multiple languages, outperforming Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3.

According to benchmarks, Scribe v1 achieves record-low error rates in converting speech to text.

7. Hugging Face, the $4 billion AI startup, has launched FastRTC, an open-source Python library for real-time audio and video AI applications.

FastRTC simplifies WebRTC integration, which enables direct browser-to-browser communication without plugins.

This makes it easier for ML engineers to build voice assistants and video tools without needing specialized WebRTC expertise.

8. Amazon has launched Alexa+, a next-gen AI assistant that dynamically switches between models like Nova and Claude.

It handles complex tasks like bookings and purchases, going beyond basic commands.

Alexa+ can now suggest new books, alert users about concerts, and even buy tickets.

It also reasons through instructions, using expert knowledge to answer questions and make decisions—like finding a restaurant your coworkers will enjoy and booking a table.

9. You . com has introduced the Advanced Research & Insights agent (ARI), an AI tool that processes 400+ sources at once.

ARI is designed to speed up market research, generating detailed reports in minutes instead of weeks.

Aimed at the $250 billion consulting industry, it automates tasks that usually require teams of analysts, making research faster and more efficient.

10. Google has launched Gemini Code Assist, an AI-powered coding tool for developers.

It offers higher usage limits than GitHub Copilot, including:

-180K code completions per month

-Support for all public programming languages

-128K token context window for better understanding

Designed to enhance coding efficiency at scale.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll