Post

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @windsurf_ai

Windsurf

@windsurf_ai

May 16

To train SWE-1, we had to create a data model and training recipe that took all of the complex states, tasks, and surfaces into consideration.

We then ran evals and experiments to evaluate performance against open and foundation models.

Here's what we did ↴

First, we evaluated how well the model could handle a user query mid-session.

Seamless collaboration with users on partially completed tasks is a crucial benchmark for model usefulness.

SWE-1 achieves near-parity with frontier models in helpfulness, accuracy, and edit quality.

We then measured the ability of the model to independently solve a problem end to end.

From a new conversation, how well does Cascade address input intent by passing a set of tests?

SWE-1 competes with frontier models, and surpasses mid-sized and open-weight alternatives.

Read 6 tweets

Windsurf

@windsurf_ai

Apr 30

We asked our devs at Windsurf to share their thoughts on their favorite models and what they actually use them for.

Read their answers in the thread ↓

3.7. Sonnet:

It’s proactive and confident but can do too much at times. Regardless, it is generally seen as the most capable.

“3.7 is just super agentic and eager to use tools and do things. I prefer stopping an over-eager model vs. coaxing an under-eager one.”

Gemini 2.5 Pro:

Preferred for tasks that require clean, structured responses.

It’s less proactive than Claude 3.7, but more consistent and less likely to introduce unrelated or duplicate code.

“Its code quality is similar to Sonnet 3.7, but it’s more consistent.”

Read 6 tweets

Windsurf

@windsurf_ai

Apr 8

Here our some of our favorite tips and tricks from the @windsurf_ai community!

Bookmark this and thank yourself later ↓

Slow Vibe Coding: Think, Plan, Prompt, Review, Validate and Start Again

Keep your prompts clear and focused. Start a fresh chat as you start a new task.

Read 9 tweets

Windsurf

@windsurf_ai

Mar 7

alright, MCP megathread 🧵

you should probably bookmark this ↓

https://x.com/windsurf_ai/status/1897824365120794905

https://x.com/windsurf_ai/status/1897824365120794905

https://x.com/windsurf_ai/status/1891664001941037123

https://x.com/windsurf_ai/status/1891664001941037123

Read 9 tweets

Windsurf

@windsurf_ai

Feb 23

Let's discuss how Large Language Models (LLMs) handle codebase structure and parsing, and what makes Windsurf particularly cracked in this area.

While most AI code tools treat code as unstructured text, Windsurf leverages Abstract Syntax Trees (ASTs) to comprehend code at the syntactic level.

Here's why this results in faster, more accurate suggestions: 🧵👇

Unlike other tools that rely on embedding indexes—a one-size-fits-all retrieval method that doesn't scale well for large repos—Windsurf's agent employs strategies akin to human developers to locate necessary context:

- Grep and file search
- File relation traversal (e.g., AST parsing)
- Web search and online documentation
- Parallel LLM-based searches

This approach ensures efficient and scalable context retrieval.

What is an AST?

During compilation, code is parsed into an Abstract Syntax Tree—a hierarchical representation of the code's syntax.

This structure allows extraction of scopes, variable bindings, and function definitions—elements that text-based models might overlook.

Read 7 tweets

Windsurf

@windsurf_ai

Nov 17, 2024

Copilots + Agents = Flows

The reason why Cascade feels like magic is because it combines the collaborative nature of copilots with the independently powerful capacity of agents.

Both Copilots and Agents are valuable, but not as much as Flows.

Let's break this down 🧵

Before the year 2022, humans and keyboards worked in unison, and code development was done completely manually. Every single line of code was a direct result of human input.

In 2022, LLM’s took the world by storm and Copilots were introduced. If you started typing out a line, it would suggest a completion. Or if you asked a question, you would receive an answer.

But they worked on scoped tasks because of single LLM calls.