Simon Willison Profile picture
Creator @datasetteproj, co-creator Django. PSF board. @nichemuseums. Hangs out with @natbat + @cleopaws. He/Him. Mastodon: https://t.co/t0MrmnJW0K
15 subscribers
Nov 12 5 tweets 2 min read
Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
simonwillison.net/2024/Nov/12/qw… So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.
Nov 4 4 tweets 2 min read
I deleted my earlier tweet about this because I misunderstood it - this is an interesting new feature for speeding up prompt inference at the expense of paying for additional tokens Here's my experiment showing that it costs more to use this feature - you're trading cost for improved performance
Nov 4 5 tweets 2 min read
Claude 3.5 Haiku is out - two surprises:

1. It's priced differently from Claude 3 Haiku. 3.5 Sonnet had the same price as 3 Sonnet, but 3.5 Haiku costs ~4x more than 3 Haiku did
2. No image input support yet

3.5 Haiku beats 3 Opus though, and Opus cost 15x the new Haiku price! I released a new version of llm-claude-3 adding support for the new model (and fixing an attachments bug):

llm install --upgrade llm-claude-3
llm keys set claude
# paste API key here
llm -m claude-3.5-haiku 'impress me with your wit'github.com/simonw/llm-cla…
Nov 3 4 tweets 2 min read
I was having a conversation with Claude about unconventional things to do in the SF Bay Area and I got a bit suspicious so I prompted "Are you sure all of those are real? I think you made some of those up." You're right to be skeptical! I should carefully review these. Let me separate them into what I'm confident exists and what I need to retract:  Definitely Real: - Musée Mécanique (100% real and awesome) - Winchester Mystery House - Wave Organ - Church of 8 wheels (roller disco in a church) - Tonga Room & Hurricane Bar - Albany Bulb art park - The Flintstone House on I-280 - Marrakech Magic Theater - Audium sound theater - Pacific Pinball Museum  Need to Retract: - Museum of International Propaganda (I may have confused this with something else) - "Sunday Acid Church Dance" - I apo... (I've actually been to the Gregangelo Museum and can confirm it definitely does exist: )niche-museums.com/14
Nov 2 4 tweets 1 min read
I've been trying this one out via LLM and the llm-gguf plugin and it seems VERY impressive for its size

Here's how to chat with the tiniest quantized version (just a 95MB file)

llm install llm-gguf
llm gguf download-model 'https:/''/huggingface.co/lmstudio-community/SmolLM2-135M-Instruct-GGUF/resolve/main/SmolLM2-135M-Instruct-Q8_0.gguf' -a smolsmol
llm chat -m smolsmol And for the largest 1.7B one - a 1.7GB download, again using a quantized GGUF from lmstudio-community:

llm gguf download-model 'https:/''/huggingface.co/lmstudio-community/SmolLM2-1.7B-Instruct-GGUF/resolve/main/SmolLM2-1.7B-Instruct-Q8_0.gguf' -a smol17
llm chat -m smol17
Oct 29 4 tweets 2 min read
I added multi-modal (image, audio, video) support to my LLM command-line tool and Python library, so now you can use it to run all sorts of content through LLMs such as GPT-4o, Claude and Google Gemini

simonwillison.net/2024/Oct/29/ll… Stuff like this works now:

llm 'transcript' \
-a 'https:/''/static.simonwillison.net/static/2024/video-scraping-pelicans.mp3' \
-m gemini-1.5-flash-8b-latest

Cost to transcribe 7m of audio with Gemini 1.5 Flash 8B? 1/10th of a cent. But let’s do something a bit more interesting. I shared a 7m40s MP3 of a NotebookLM podcast a few weeks ago. Let’s use Flash-8B—the cheapest Gemini model—to try and obtain a transcript.  llm 'transcript' \   -a https://static.simonwillison.net/static/2024/video-scraping-pelicans.mp3 \   -m gemini-1.5-flash-8b-latest  It worked!      Hey everyone, welcome back. You ever find yourself wading through mountains of data, trying to pluck out the juicy bits? It’s like hunting for a single shrimp in a whole kelp forest, am I right? Oh, tell me about it. I swear, sometimes I feel like I’m gonna go c...
Oct 21 4 tweets 2 min read
I built 14 interactive mini-apps with Claude Artifacts just in the past seven days - here's a post describing them all simonwillison.net/2024/Oct/21/cl… Here's the full list of artifacts I built for myself in the past week      URL to Markdown with Jina Reader     SQLite in WASM demo     Extract URLs     Clipboard viewer     Pyodide REPL     Photo Camera Settings Simulator     LLM pricing calculator     YAML to JSON converter     OpenAI Audio     QR Code Decoder     Image converter and downloader     HTML Entity Escaper     text-wrap-balance-nav     ARES Phonetic Alphabet Converter
Sep 25 4 tweets 2 min read
Llama 3.2 is out, and it's a much more substantial release than the 3.1 to 3.2 version bump might indicate

Four new models, including Meta's first two vision models (11B and 90B) and two new text-only small models (1B and 3B)

My notes so far: simonwillison.net/2024/Sep/25/ll… You can try out the vision models on by navigating to "Direct Chat" and then selecting those models from the dropdown menu lmarena.ai
llama-3.2-vision-90b-instruct  A photo of two capybaras  Prompt: Describe this photo in detail  The image depicts two capybaras standing in a body of water, with one positioned on the edge of a metal grate. The capybara on the right is partially submerged in the water, while the other is standing on the grate.  Key Features:      Capybaras: Two capybaras are visible in the image.     Water: The capybaras are standing in a body of water, which appears to be murky or dirty.     Metal Grate: A metal grate is visible at the bottom right of the image, with one capybara standing on it.     Backgr...
Sep 4 5 tweets 2 min read
OK, the Apache 2 licensed Qwen2-VL-7B Instruct from Alibaba looks VERY impressive

Anyone figured out a recipe to get it working on a Mac yet? qwenlm.github.io/blog/qwen2-vl/
I tried the Hugging Face demo and got a very impressive result for handwriting recognition
Aug 5 4 tweets 2 min read
Not surprising to see NVIDIA doing this - practically the industry standard right now - but interesting to see details of what they're collecting and why:

"Movies are actually a good source of data to get gaming-like 3D consistency and fictional content but much higher quality" More on my blog:

My intuition is the backlash against scraped video data will be even more intense than for static images in image models. Video is generally more expensive to create, and video creators (such as MKBHD) have a lot of influence.simonwillison.net/2024/Aug/5/nvi…
Jul 8 5 tweets 2 min read
I needed a box shadow the other day, so I got Claude to build me a custom one-off interface for fiddling with box shadow properties

Pretty wild how we can build tiny custom tools like this in a similar time that it would take to search for and select a tool that already exists! Claude prompt: CSS for a slight box shadow, build me a tool that helps me twiddle settings and preview them and copy and paste out the CSS  Response:  Certainly! I'll create a React component that allows you to adjust box shadow settings and preview the result in real-time. This tool will also provide the CSS code for you to copy and paste.  I've created a React component that allows you to adjust various box shadow settings and see the results in real-time. Here's a breakdown of the features:  Preview: A box at the top shows you how the shadow looks with your current settings.  Sliders for... Related: Maggie Appleton's excellent recent talk about home-cooked software
Jul 2 5 tweets 2 min read
Anyone figured out how to run Gemini Nano in Google Chrome Canary?

I turned on the feature flag for it but it doesn't seem to have downloaded the model file - the "await window['ai'].createTextSession();" API returns an error "InvalidStateError: The session cannot be created" Screenshot of chrome://flags showing that I have enabled the "Prompt API for Gemini Nano" experiment Here's the documentation I was missing
Apr 25 9 tweets 3 min read
"Do stuff and then blog about it" remains one of the most underrated pieces of career advice My personal policy is that the cost I have to pay for being distracted by a fun new project is that I have to write about it

Never take on a project without also writing about it: so much value is lost if you don't give the world a fighting chance of understanding what you made!
Apr 14 4 tweets 1 min read
Have any of the large scale LLM training organizations - Anthropic, OpenAI, Gemini, Mistral, the Llama team - published anything notable about this idea of "model collapse" yet - the worry that LLM quality will drop as their training data becomes pollinated by model output? I meant "polluted" not "pollinated" there
Mar 31 6 tweets 2 min read
What embedding models are there with separation between different modes of content?

I know E5-Large-V2 has that ("passage" vs "query"), and @nomic_ai have "search_query", "search_document", "clustering", "classification"

Any other good examples?docs.nomic.ai/reference/endp… Being able to embed your content as "passage" but questions people ask about it as "query" is useful for implementing RAG - a user's question might not naturally embed to a similar location as content that answers that question, this trick helps fix that til.simonwillison.net/llms/embed-par…
Mar 26 7 tweets 3 min read
TIL about binary vector search... apparently there's a trick where you can take an embedding vector like [0.0051, 0.017, -0.0186, -0.0185...] and turn that into a binary vector just reflecting if each value is > 0 - so [1, 1, -1, -1, ...] and still get useful cosine similarities! ... which is absurdly useful, because it means you can store those vectors in 1/32th of the original space, and run MUCH faster comparisons against them

Then you can brute-force to get the 100 top matches, then run the more expensive full floating point ops against that subset
Mar 25 6 tweets 2 min read
I'm really confused about the new @MistralAI model

There's no official announcement about it anywhere that I've found

It's apparently a 0.2 of their 7B model, but they've had other things called 0.2 previously?

Then there's this commit - is this it? huggingface.co/mistralai/Mist…
A diff to the README for  # Model Card for Mistral-7B-Instruct-v0.2  Text has been added saying:   The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.  Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1 - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention ollama have it as a "text completion" model, which sounds different from the instruct model I linked to above:
Mar 23 7 tweets 3 min read
Just got ChatGPT Code Interpreter to write me a SQLite extension in C from scratch, then compile it with GCC, then load it into Python and test it out, then iterate on it to fix the bugs

All on my phone while pottering around the house Here's the full transcript - it got stuck in a bit of an error loop towards the end of this first one:

So I pasted what it had done so far into a fresh session and got it working there: chat.openai.com/share/a52ee3e4…
chat.openai.com/share/29bd1d15…
Aug 3, 2023 5 tweets 2 min read
The video for my North Bay Python talk is out, and I've put together an accompanying edited transcript with annotated slides and links



If you haven't been completely immersed in this world for the last year, my hope is this can help catch you up!simonwillison.net/2023/Aug/3/wei… Here's the table of contents      What they are     How they work     A brief timeline     What are the really good ones     Tips for using them     Using them for code     What can we build with them?         Giving them access to tools         Retrieval augmented generation         Embeddings and vector search     ChatGPT Plugins     ChatGPT Code Interpreter     How they’re trained         LLaMA training data         Reinforcement learning from human feedback     Openly licensed models     My LLM utility     Prompt injection
Jul 12, 2023 13 tweets 5 min read
Huge new release of my LLM CLI tool (and Python library) for accessing Large Language Models: it now supports additional models via plugins, so you can "llm install llm-gpt4all" to get models that run on your own machine!
simonwillison.net/2023/Jul/12/ll… I want to make local models (and remote API-driven models) as easy to try out as possible, so I put together this detailed tutorial about how to build an LLM plugin that adds support for a new model: llm.datasette.io/en/stable/plug…
Jun 13, 2023 4 tweets 2 min read
A whole bunch of big API announcements from @OpenAI today - including an API-level implementation of the ReAct pattern where you can provide custom function calls that can be triggered by the language models openai.com/blog/function-… The new gpt-3.5-turbo-16k model is a big deal: 16,000 token context (4x the previous limit for 3.5) charged at $0.003 per 1K input tokens and $0.004 per 1K output tokens, which is still a whole lot cheaper than GPT-4 (more than 10x that price for 8k, more than 20x that for 32k)