Taylor Ogan Profile picture
CEO at @snowbullcapital / Moved to China for investment opportunities / Boston to Shenzhen / Grammar nut

Dec 4, 2025, 7 tweets

Another DeepSeek moment. This is the world’s first actual smart phone. It’s an engineering prototype of ZTE’s Nubia M153 running ByteDance’s Doubao AI agent fused into Android at the OS level. It has complete control over the phone. It can see the UI, choose/download apps, tap/type, call, and run multi-step task chains.

Here I just say (in English) “find someone to wait in line for me” (something you can do in China), and it picks which app to open, configures the job, and hands me one confirm screen. I wouldn’t otherwise know how to do this, and here the phone just did it in a matter of seconds.

This isn’t a chat overlay, it’s a true multimodal agent. It has the brand-new Snapdragon 8 Elite Gen 5 with 16GB RAM, so it can push a lot of the agentic workload on-device. Here I take a picture of a NIO battery swap station and ask, “What is this thing?” It’s running ByteDance’s Doubao model (>175M users in China): a massive, sparse MoE model with full text+vision support. It recognizes the infrastructure from the photo, grounds it to NIO’s network, and explains what it does.

Here you see the cloud + on-device split very cleanly. Doubao handles the semantics: from a single hotel entrance photo it figures out which hotel this is, that I want to book tonight, and that it will need to check the hotel’s pet policy.

Then ZTE’s 7B Nebula-GUI model (a vision model trained to understand screens) running locally on the Snapdragon 8 Elite drives the UI like a human: it picks Ctrip app, opens it, fills in dates, finds cheapest rate, reads the hotel policy on pets, and informs me I can bring a dog.

I ask, “book a robotaxi from here to Talent Park.” The agent knows my GPS, knows which robotaxi operators actually serve this area and my destination, and plans the route at the Doubao layer. Then Nebula-GUI takes over, chooses Baidu Apollo, taps through the app, asks which part of the park I want, and books the ride using the closest pickup point.

This is a GUI agent that has been trained specifically on mobile app flows in China, now wired into a live phone. I give it intent in language, and it handles operator selection, app choice, and all of the tapping.

This is where it stops feeling like “voice commands” and starts feeling like a real assistant. I don’t remember which number I logged into the Baidu Apollo robotaxi app. Doubao digs into the app’s settings and tells me the last four digits of this account’s phone number so I can authenticate the robotaxi door.

Mid-ride I ask it to change the dropoff: it knows there’s an active Apollo trip, focuses that app, edits the destination via Nebula-GUI, and both the car and my phone confirm the new route. When I get out, I ask it to order something refreshing by drone delivery, and it just starts orchestrating that chain, too.

I tell it to order me two of the drinks in front of me. It reuses the cart, updates quantity, pays, and a Meituan drone flies the order to a nearby locker. When Meituan’s automated phone system calls to say the delivery arrived, Doubao auto-answers and talks to their bot on my behalf. You’re literally watching ByteDance’s agent, ZTE’s GUI, Meituan’s AI, and the AV/drone stack negotiate around a human who’s basically idle.

When I’m walking around, I’m just using it as a background intelligence layer. I have ADD, so this is dangerous/awesome. I take a photo of a new store and ask if it’s a Shenzhen brand; it checks business/trademark data and says yes. I take a picture of a guy in an NYPD jacket and ask if he’s actually a cop; it understands we’re in Shenzhen, and recognizes the jacket as civilian style, and says he isn’t.

On that same photo I ask it to put him in a Chinese police uniform, then an FBI raid jacket; ByteDance’s image model rewrites only the clothing and keeps the scene intact. Then I point it at a Brompton shop and ask why these bikes suddenly blew up here, and it gives a genuinely good answer.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling