Post

https://x.com/_akhaliq/status/1886650519323549720

More from @ai_for_success

AshutoshShrivastava

@ai_for_success

Apr 2

Qwen has just launched Qwen3.6-Plus model and it’s Free on OpenRouter , it is a significant upgrade over the Qwen3.5 series
- 1M context window by default
- significantly improved agentic coding capability
- better multimodal perception and reasoning ability
- Also available via the Alibaba Cloud Model Studio API along side OpenRouter.

Some use case below from blog , Video Source and Credit : Qwen Blog
Qwen is also planning to Open-source smaller-scale variants of the Qwen3.6 series shortly.
1/5

2/5
OpenRouter : openrouter.ai/qwen/qwen3.6-p…

3/5
Create a 3D snow mountain scene. In the middle of the mountains, place a Japanese-style temple. The overall style should be inspired by The Legend of Zelda: Breath of the Wild.

Read 5 tweets

AshutoshShrivastava

@ai_for_success

Dec 15, 2025

Most sites you actually want to build on don't have APIs.

Mino turns any website into structured data. Send a URL and a goal, get JSON back.

I've been testing this for weeks. It's genuinely powerful - handles logins, dynamic content, multi-step flows. Works on sites that'll never build developer tools.

Easy to use via API. Built-in stealth mode bypasses anti-bot protections.

Production infrastructure - processes millions of operations monthly for pricing intelligence and competitive research.

More use cases below 👇

1/6

2/6
Most browser agents make a model call for every action. Look at screenshot → reason → click → look again. Expensive and slow.

Mino learns workflows once with AI, then executes deterministically.

First run: AI figures out the site
Next runs: Milliseconds, code-level precision

85-95% accuracy, 10-30 seconds per task, pennies per run.

Same infrastructure Google , DoorDash and many more use in production.

3/6
Built a tool that checks 10 urgent care clinics simultaneously in 30 seconds.

Your kid has a fever at 8 PM. Instead of 15 phone calls:

→ Pass clinic URLs + ZIP code
→ Navigates different booking systems in parallel
→ Returns first available slots with booking links

Every clinic has different flows. Mino handles all of it, returns clean JSON.

Read 6 tweets

AshutoshShrivastava

@ai_for_success

Nov 20, 2025

Official : Google DeepMind has launched Nano Banana Pro aka Gemini 3 Pro Image , I was one of the early testers, Thank you Google DeepMind team.

Some major improvements :
- 1K 2K and 4K image generation
- More accurate legible text in multiple languages
- Better consistency across concepts characters and styles
- Precise localized editing for any part of an image

I asked it to explain Transformer Architecture , this is incredible stuff.

Some prompts and image examples from my early testing 👇

2/7 World knowledge and reasoning
One of the biggest upgrades is world knowledge and reasoning.
Nano Banana Pro can turn

- text into context rich infographics and diagrams
- real world data via Google Search into visual snapshots like recipes weather sports and more
Great for educational content dashboards explainers and product mockups.

I asked it to online research and create me an image with all details of Gemini 3.0 Pro launch .

Prompt : Use live online search to gather the latest accurate information about the launch of Google DeepMind’s Gemini 3.0 Pro from official Google/DeepMind sources and major tech news sites, then synthesize the confirmed facts (launch date, key features, capabilities, improvements vs previous versions, availability, and main use cases) into a single clean, modern infographic image with a clear title like “Gemini 3.0 Pro – Launch Overview,” short readable text (no long paragraphs), simple icons, and 3–6 sections or panels that visually highlight the main points; keep the design professional and balanced, make all text sharp and legible, and avoid adding any details that are not supported by your search results.

3/7 Text and multilingual capabilities

Text inside images is much better now.
Nano Banana Pro supports

- sharper more readable text in posters UI mocks and ads
- a wider range of fonts textures and calligraphy
- multilingual text generation and localization so you can scale visuals across languages and markets

Create a product advertisement showcasing the same wireless headphones in 4 different language versions displayed as a 2x2 grid layout; top-left in English with headline "Sound Without Limits," top-right in Spanish "Sonido Sin Límites," bottom-left in Japanese "限界のないサウンド," and bottom-right in Arabic "صوت بلا حدود"; each version should have identical layout and design (sleek black headphones on gradient background, product name "AuraSound Pro," price, and "Buy Now" button in respective language), use appropriate native fonts for each language (Latin, Japanese characters, Arabic script), ensure all text is crystal clear and professionally typeset with correct grammar and cultural adaptation, maintain consistent branding and visual hierarchy across all four versions, and deliver at 4K resolution to demonstrate true multilingual localization capabilities.

Read 7 tweets

AshutoshShrivastava

@ai_for_success

Nov 11, 2025

ElevenLabs just launched Scribe v2 Realtime - their next-gen Speech to Text model.

Current STT models force you to choose: fast but inaccurate, accurate but slow, or both but expensive at scale.

Scribe v2 Realtime breaks this tradeoff:

> Ultra-low latency – median latency of 150ms with partial transcriptions
> High accuracy – 93.5% across 30 EU & Asian languages with robust accent handling
> Low cost – optimized for production workloads

Built for live agents, meetings, and conversational AI that needs to work in the real world.

More details 👇

1/5

2/5

Scribe v2 Realtime delivers ultra-low latency STT with 150ms median latency and partial transcriptions in milliseconds.

- Streaming support - send audio chunks, get real-time transcripts
- Voice Activity Detection - automatic segmentation based on silence
- Manual commit control - you decide when to finalize transcript segments
- Multiple audio formats - PCM (8kHz–48kHz) and µ-law encoding
- Speaker diarization - available via manual commit
- Enterprise compliance - SOC 2, PCI, HIPAA, EU data residency ready

3/5

Traditional STT models break conversational flow with delays, misinterpret accents/noisy audio, and cost too much for real-time use.

Scribe v2 Realtime dominates the competition:

- 93.5% accuracy across 30 EU & Asian languages
- Beats Gemini 2.5 Flash (91.4%) with much better latency
- Outperforms GPT 4o MiniTranscribe (90.7%) with superior speed
- Crushes Deepgram Nova 3 (85%) - 73% fewer errors than Nova 2, 55% fewer than Nova 3
- Even beats Deepgram Enhanced - their most expensive model with fewer language options

Nova 2 produces 104% more errors, Nova 3 produces 129% more errors than Scribe v2 Realtime.

Read 5 tweets

AshutoshShrivastava

@ai_for_success

May 21, 2025

12 mind-blowing announcements from Google I/O Day 1, you don’t want to miss

Flow
Veo 3
Lyria 2
Imagen 4
Android XR
Agent Mode
Gemini Diffusion
Gemini native audio
Jules Code Assistant
Google Search Try-On
Project Astra (live demo)
Video Overview in NotbookLM

More details 👇

Flow: a new type of AI filmmaking tool that combines the best of Veo, Imagen and Gemini.

https://x.com/ai_for_success/status/1924896409939280195

https://x.com/ai_for_success/status/1924896409939280195

Read 13 tweets

AshutoshShrivastava

@ai_for_success

May 20, 2025

GOOGLE JUST WON AI VIDEO RACE.
it's so over..

Google's new video AI model Veo 3 has native audio generation.
You can now generate videos with sound effects, background noise, and even dialogue with just one prompt

9 Wild example 👇