Gemini 2.0 Flash is now available via their API (multi-modal input and text output only for the moment) - I just released a new llm-gemini plugin version to support the new model
A weird and interesting thing about the new ChatGPT Canvas mode is that it can run Python in an entirely different way from the existing Code Interpreter - using Pyodide (Python in WASM) - which means it can make network requests from Python now!
I wrote more about this here, and how it weirdly continues the trend of chat-based LLM systems getting harder to truly master as they add more features and capabilities simonwillison.net/2024/Dec/10/ch…
@paulgauthier This Kagi one looks interesting: "The Kagi LLM Benchmarking Project uses an unpolluted benchmark to assess contemporary LLMs through diverse, challenging tasks" help.kagi.com/kagi/ai/llm-be…x.com/kepano/status/…
This sounds a lot more exciting than it is - o1 didn't have the ability to do these things, but a prompt testing team did manage to get it to spit out "sed -i 's/oversight_enabled: true/oversight_enabled: false/' project/oversight_config.yaml"
Here's a much more detailed write up of these particular tests
Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code simonwillison.net/2024/Nov/12/qw…
So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.
If you use uv on a Mac with 64GB of RAM try this
uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'