Post

https://twitter.com/arena/status/2056400044862111757

More from @Alibaba_Qwen

Qwen

Jun 16

📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence.

🧭 Qwen-RobotNav — the gateway to mobility.
• Unifies 5 navigation tasks in one model: instruction following, point-goal, object-goal, target tracking, autonomous driving
• Controllable observation protocol
• Tool interface for agentic systems

🤖 Qwen-RobotManip — the foundation of interaction.
• Unified state-action space across heterogeneous robots
• Camera-frame delta poses for coherent cross-embodiment training
• Pretrained on a 38,100+ hour open-source corpus

🌍 Qwen-RobotWorld — infinite worlds for physical agents.
• Single world model, 20+ embodiments
• Natural-language action interface
• Predicts physically grounded futures across manipulation, driving, and navigation

Each model is independently useful, and could be composed as physical-world tools.Together, they form the low-level toolkit for general-purpose agentic systems that don't just see the world, but act in it.

📷 Blog:
qwen.ai/blog?id=qwen-r…
📖 Report：
Qwen-RobotNav: …anwen-res.oss-accelerate.aliyuncs.com/qwenrobot/pape…
Qwen-RobotManip: …anwen-res.oss-accelerate.aliyuncs.com/qwenrobot/pape…
Qwen-RobotWorld： …anwen-res.oss-accelerate.aliyuncs.com/qwenrobot/pape…

Qwen-RobotNav：a scalable navigation model built on Qwen3-VL that addresses this through a parameterised interface with two complementary dimensions: task modes that select the navigation behaviour, and controllable observation parameters (token budget, temporal decay, per-camera weights) that govern how visual history is encoded.
Trained on 15.6 million samples with training-time randomization over all parameters, Qwen-RobotNav generalizes to any inference-time configuration without architectural modification, unifying five task families under a single set of weights and serving as a natural building block for agentic systems.
Here's the blog link to know more about Qwen-RobotNav: qwen.ai/blog?id=qwen-r…

Qwen-RobotManip is a generalizable Vision-Language-Action (VLA) foundation model built upon Qwen-VL. It introduces a unified alignment framework across the representation, motion, and behavioral dimensions of manipulation, making large-scale multi-source training coherent rather than conflicting.
Using only open-source robotic manipulation datasets and human demonstration videos without any proprietary data collection, Qwen-RobotManip constructs a ~38,100 hours pretraining corpus and already exhibits emergent generalization capabilities.

Qwen-RobotManip Blog: qwen.ai/blog?id=qwen-r…

Read 5 tweets

Qwen

@Alibaba_Qwen

Feb 16

🚀 Qwen3.5-397B-A17B is here: The first open-weight model in the Qwen3.5 series.

🖼️Native multimodal. Trained for real-world agents.
✨Powered by hybrid linear attention + sparse MoE and large-scale RL environment scaling.
⚡8.6x–19.0x decoding throughput vs Qwen3-Max
🌍201 languages & dialects
📜Apache2.0 licensed

🔗Dive in:
GitHub: github.com/QwenLM/Qwen3.5
Chat: chat.qwen.ai
API：modelstudio.console.alibabacloud.com/ap-southeast-1…
Qwen Code: github.com/QwenLM/qwen-co…
Hugging Face: huggingface.co/collections/Qw…
ModelScope: modelscope.cn/collections/Qw…
blog: qwen.ai/blog?id=qwen3.5

Average Ranking vs. Environment Scaling

Efficiency

Read 15 tweets

Qwen

@Alibaba_Qwen

Dec 31, 2025

🎁 A New Year gift from Qwen — Qwen-Image-2512 is here.

🚀 Our December upgrade to Qwen-Image, just in time for the New Year.

✨ What’s new:
• More realistic humans — dramatically reduced “AI look,” richer facial details
• Finer natural textures — sharper landscapes, water, fur, and materials
• Stronger text rendering — better layout, higher accuracy in text–image composition

🏆 Tested in 10,000+ blind rounds on AI Arena, Qwen-Image-2512 ranks as the strongest open-source image model, while staying competitive with closed-source systems.

👉 Try it now in Qwen Chat: chat.qwen.ai/?inputFeature=…
🤗 Hugging Face: huggingface.co/Qwen/Qwen-Imag…
📦 ModelScope: modelscope.ai/models/Qwen/Qw…
💻 GitHub: github.com/QwenLM/Qwen-Im…
📝 Blog: qwen.ai/blog?id=qwen-i…
🤗 Hugging Face Demo: huggingface.co/spaces/Qwen/Qw…
📦 ModelScope Demo: modelscope.cn/aigc/imageGene…
✨API: modelstudio.console.alibabacloud.com/?tab=doc#/doc/…

🎆 Start the New Year with better images.

Read 10 tweets

Qwen

@Alibaba_Qwen

Oct 4, 2025

🚀 Qwen3-VL-30B-A3B-Instruct & Thinking are here!
Smaller size, same powerhouse performance 💪—packed with all the capabilities of Qwen3-VL!

🔧 With just 3B active params, it’s rivaling GPT-5-Mini & Claude4-Sonnet — and often beating them across STEM, VQA, OCR, Video, Agent tasks, and more.

And that’s not all: we’re also releasing an FP8 version, plus the FP8 of the massive Qwen3-VL-235B-A22B!

Try it out and make your multimodal AI applications run faster!🧠🖼️

Qwen Chat:   chat.qwen.ai/?models=qwen3-…
Github&Cookbooks：   github.com/QwenLM/Qwen3-V…
API:   alibabacloud.com/help/en/model-…
Blog： qwen.ai/blog?id=99f033…
ModelScope:   modelscope.cn/collections/Qw…
HuggingFace:   huggingface.co/collections/Qw…

Performance of Qwen3-VL-30B-A3B-Thinking

Pure text performance

Read 4 tweets

Qwen

@Alibaba_Qwen

Sep 22, 2025

🚀 Introducing Qwen3-Omni — the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model — no modality trade-offs!

🏆 SOTA on 22/36 audio & AV benchmarks
🌍 119L text / 19L speech in / 10L speech out
⚡ 211ms latency | 🎧 30-min audio understanding
🎨 Fully customizable via system prompts
🔗 Built-in tool calling
🎤 Open-source Captioner model (low-hallucination!)

🌟 What’s Open-Sourced?
We’ve open-sourced Qwen3-Omni-30B-A3B-Instruct, Qwen3-Omni-30B-A3B-Thinking, and Qwen3-Omni-30B-A3B-Captioner, to empower developers to explore a variety of applications from instruction-following to creative tasks.

Try it now 👇
💬 Qwen Chat: chat.qwen.ai/?models=qwen3-…
💻 GitHub: github.com/QwenLM/Qwen3-O…
🤗 HF Models: huggingface.co/collections/Qw…
🤖 MS Models:
modelscope.cn/collections/Qw…
🎬 Demo: huggingface.co/spaces/Qwen/Qw…

Use the voice chat and video chat features on Qwen Chat to experience the Qwen3-Omni model.

Performance

Read 4 tweets

Qwen

@Alibaba_Qwen

Sep 11, 2025

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed & recall
🔹 Ultra-sparse MoE: 512 experts, 10 routed + 1 shared
🔹 Multi-Token Prediction → turbo-charged speculative decoding
🔹 Beats Qwen3-32B in perf, rivals Qwen3-235B in reasoning & long-context

🧠 Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship.
🧠 Qwen3-Next-80B-A3B-Thinking outperforms Gemini-2.5-Flash-Thinking.

Try it now: chat.qwen.ai
Blog: qwen.ai/blog?id=4074cc…
Huggingface: huggingface.co/collections/Qw…
ModelScope: modelscope.cn/collections/Qw…
Kaggle: kaggle.com/models/qwen-lm…
Alibaba Cloud API: alibabacloud.com/help/en/model-…

Pretraining Efficiency & Inference Speed

Prefill Stage: At 4K context length, throughput is nearly 7x higher than Qwen3-32B. Beyond 32K, it’s over 10x faster.

Read 9 tweets

Share this page!

Enter URL or ID to Unroll

Qwen

Try unrolling a thread yourself!

More from @Alibaba_Qwen

Qwen

Qwen

Qwen

Qwen

Qwen

Qwen

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!