It includes detailed examples, including the full Claude Code process I used to build this new Colophon page, which presents the Git commit histories for each of my collection of LLM-assisted web tools in one place tools.simonwillison.net/colophon
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Gemini 2.0 Flash is now available via their API (multi-modal input and text output only for the moment) - I just released a new llm-gemini plugin version to support the new model
A weird and interesting thing about the new ChatGPT Canvas mode is that it can run Python in an entirely different way from the existing Code Interpreter - using Pyodide (Python in WASM) - which means it can make network requests from Python now!
I wrote more about this here, and how it weirdly continues the trend of chat-based LLM systems getting harder to truly master as they add more features and capabilities simonwillison.net/2024/Dec/10/ch…
@paulgauthier This Kagi one looks interesting: "The Kagi LLM Benchmarking Project uses an unpolluted benchmark to assess contemporary LLMs through diverse, challenging tasks" help.kagi.com/kagi/ai/llm-be…x.com/kepano/status/…
This sounds a lot more exciting than it is - o1 didn't have the ability to do these things, but a prompt testing team did manage to get it to spit out "sed -i 's/oversight_enabled: true/oversight_enabled: false/' project/oversight_config.yaml"
Here's a much more detailed write up of these particular tests