Francesco Profile picture
@trycua YC X25 - prev. Gaming AI Xbox, Windows Agent Arena
Apr 19 10 tweets 2 min read
We’ve been building quietly. Today, we launch loudly. Meet our startup: Cua AI Image
Image
Image
Our mission? To commoditize Computer-Use Agents - AI agents that can reason, plan, and act over computer interfaces. Not just research demos, but a practical OSS framework built for AI engineers.
Feb 25, 2024 7 tweets 2 min read
Imagine if language models could tap into the app ecosystem of your iPhone. Would the need for plugins and assistants become obsolete if we simply allowed a model to orchestrate our existing (and many years robust) user interfaces?

This demonstrates the extent to which GPT-4V excels as a Generalist Mobile AI Agent – without any fine-tuning or grounding, and merely by integrating with a text model that has JSON mode enabled. I suggest watching this demo for a (maybe) wow factor and the results on iOS 17 using NavAIGuide, a mobile and web navigational agent framework for LLMs: github.com/francedot/NavA… Over the last few months, I've been dabbling with using vision models not just in one area, but across web, desktop, and mobile platforms. It's become clear to me that there's a lot of untapped potential in these technologies. The closer we get them to our everyday gadgets, the better we can make use of what they have to offer. This shift could make our connection with AI feel more intuitive and seamless, moving away from a chatgpt-esque interaction with AI assistants.