Maxime Rivest 🧙‍♂️🦙🐧 Profile picture
Aug 4 3 tweets 2 min read Read on X
One of the first things I was looking for when I got into dspy was to combine it with offline vllm batch inference.

But the whole dspy stack is built on single calls, supporting retries and asynchronicity etc.

Still, I wanted to be able to use dspy easily with performant locally hosted models.

After much fiddling and tinkering here and there, I found the special incantation to make vllm and dspy work together. It was a bit too long to just share the snippet of code, so I wrapped it up into a library. It's a 1 file init.py 500 LOC library, it should not be too hard for me (us :D) to maintain. It is quite powerful!

In my performance test, vllm directly was running my task in 65 seconds; through dspy, it's 68 seconds.

So here it is:
uv venv
source .venv/bin/activate
uv python install 3.12 --default
uv pip install ovllm

ps: vllm 0.10.0 has a dependency that does not work with Python 3.13 for now.Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Maxime Rivest 🧙‍♂️🦙🐧

Maxime Rivest 🧙‍♂️🦙🐧 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @MaximeRivest

Jul 31
Today, I woke up and I thought:

Wouldn't it be nice if I could use an llm for autocomplete instead of small dumb copilot/cursor type models?

Speed + Quality of Kimi-K2 on Groq makes it possible!

So, in 1 hour, I vibe coded a vscode extension, just before starting my day at work. Here is how 🧵

ps: I only took about ~5 prompts.
1. For the first prompt, I opened a fresh conversation in with all 4: o3-pro, gemini 2.5-pro, opus 4 extended thinking and grok 4-heavy.

the prompt was:

I want to be able to press a keyboarb shortcut in vscode and the whole content of my current code (plus the other opened tab), plus instructions to a open ai end point would be sent to them and the ai would automcomplete and add code for the 'section'. the ai will decide what section its confident to predict how should we do it?Image
2. For the second prompt, I took all 4 output of the first prompt and put them all together in 1 text file and I copy pasted all that into a fresh chat for all four, with that prompt:

I asked 4 llm: I want to be able to press a keyboarb shortcut in vscode and the whole content of my current code (plus the other opened tab), plus instructions to a open ai end point would be sent to them and the ai would automcomplete and add code for the 'section'. the ai will decide what section its confident to predict how should we do it? below are the responses, considering plus your own judgement. Help me make that work. I want to use groq this is a snippet they recommend: from groq import Groq import { Groq } from 'groq-sdk'; const groq = new Groq(); const chatCompletion = await groq.chat.completions.create({ "messages": [ { "role": "user", "content": "" } ], "model": "moonshotai/kimi-k2-instruct", "temperature": 0.6, "max_completion_tokens": 4096, "top_p": 1, "stream": true, "stop": null }); for await (const chunk of chatCompletion) { process.stdout.write(chunk.choices[0]?.delta?.content || ''); } I will be fiddling with the right prompts, just focus on the mechanics. My main goal is to use llms (because they are Now fast enough as automcomplete instead of cursor tab or github copilot type of things).Image
Read 6 tweets
Jul 14
I just released the first version of Attachments CLI 🖇️!

$ uv tool install attachments

$ attachments ~/my_repo/ --clipboard --glob '**/*.py'

Attachments is a Python library (and cli!) with a simple mission: to be your universal LLM funnel.

path -> clipboard

in 1 line Image
This line below gets me all the .ts files and their content that are found in the attjsplay directory AND a tree of the directory.

att ~/Projects/maximumplay/src/attjsplay/ --clipboard --glob '*.ts' --mode structure --files Image
For more details and on the whole library see:

maximerivest.github.io/attachments/
Read 4 tweets
Jul 13
I needed to understand exactly what DSPy was sending to the LLMs on my behalf before I could trust it.

If you are like me, just run adapter.format yourself (the default is ChatAdapter) and you will see exactly what is happening. If you do not like the result, implement your own. DSPy is modular and fully supports that.

See the thread for how it renders.Image
So the ChatAdapter would do that: Image
the JSONAdapter does that: Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(