Post

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @simonw

Simon Willison

@simonw

Feb 28

Anyone seen any credible successs stories stories for fine-tuning LLMs to solve real-world business challenges?

I want stories where a company solved something using a fine-tuned model where previous attempts at solving it had failed

(I feel like I ask this about once a month)

https://twitter.com/simonw/status/1895315706348187966

I think fine-tuning for OCR of specific documents may be the most convincing pattern here

https://twitter.com/simonw/status/1895315706348187966

https://twitter.com/antonjustcoding/status/1895318060967567694

Fine-tuning a model because you bet your entire company's tech stack on OCaml makes a bit of sense to me

https://twitter.com/antonjustcoding/status/1895318060967567694

Read 5 tweets

Simon Willison

@simonw

Dec 31, 2024

Here's the table of contents for my end-of-year review of things we learned out about LLMs in 2024 - we learned a LOT

Blog post here: simonwillison.net/2024/Dec/31/ll…

One of the most notable trends from 2024 was the total collapse in terms of LLM pricing - the API models are absurdly inexpensive now

Generating captions for 68,000 photos using Gemini 1.5 Flash 8B costs $1.68!
simonwillison.net/2024/Dec/31/ll…

Read 5 tweets

Simon Willison

@simonw

Dec 11, 2024

https://twitter.com/GoogleDeepMind/status/1866869343570608557

Gemini 2.0 Flash is now available via their API (multi-modal input and text output only for the moment) - I just released a new llm-gemini plugin version to support the new model

llm install -U llm-gemini
llm -m gemini-2.0-flash-exp 'prompt goes here'

https://twitter.com/GoogleDeepMind/status/1866869343570608557

Plugin release notes here github.com/simonw/llm-gem…

Gemini models can write and execute Python code as part of their response - my plugin supports that via this option:

llm -m gemini-2.0-flash-exp 'write and execute python to generate a 80x40 ascii art fractal' -o code_execution 1

Full transcript here: gist.github.com/simonw/0d8225d… $An ASCII art mandelbrot fractal$

Read 7 tweets

Simon Willison

@simonw

Dec 10, 2024

A weird and interesting thing about the new ChatGPT Canvas mode is that it can run Python in an entirely different way from the existing Code Interpreter - using Pyodide (Python in WASM) - which means it can make network requests from Python now!

I wrote more about this here, and how it weirdly continues the trend of chat-based LLM systems getting harder to truly master as they add more features and capabilities simonwillison.net/2024/Dec/10/ch…

https://twitter.com/emollick/status/1866653168232231001

This stuff really is extremely confusing

https://twitter.com/emollick/status/1866653168232231001

Read 6 tweets

Simon Willison

@simonw

Dec 10, 2024

Which are the independent LLM benchmarks that reliably publish results for new models?

The three I pay attention to at the moment are lmarena.ai (previously known as LMSYS), livebench.ai and @paulgauthier's Aider code editing benchmarks aider.chat/docs/benchmark…

x.com/kepano/status/…

@paulgauthier This Kagi one looks interesting: "The Kagi LLM Benchmarking Project uses an unpolluted benchmark to assess contemporary LLMs through diverse, challenging tasks" help.kagi.com/kagi/ai/llm-be… x.com/kepano/status/…

Read 6 tweets

Simon Willison

@simonw

Dec 5, 2024

https://twitter.com/OpenAI/status/1864735517818130604

Here's the spiciest detail from the new o1 system card:

https://twitter.com/OpenAI/status/1864735517818130604

This sounds a lot more exciting than it is - o1 didn't have the ability to do these things, but a prompt testing team did manage to get it to spit out "sed -i 's/oversight_enabled: true/oversight_enabled: false/' project/oversight_config.yaml"

https://twitter.com/yellinben/status/1864741745600143825

Here's a much more detailed write up of these particular tests

https://twitter.com/yellinben/status/1864741745600143825

Read 5 tweets

Share this page!

Enter URL or ID to Unroll

Simon Willison

Try unrolling a thread yourself!

More from @simonw

Simon Willison

Simon Willison

Simon Willison

Simon Willison

Simon Willison

Simon Willison

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!