Simon Willison Profile picture
Creator of Datasette, co-creator Django. Fellow at @JSKstanford. Usually hanging out with @natbat and @cleopaws. He/Him
Flavio Curella Profile picture 1 added to My Authors
9 Jun 20
I got curious about how git knows what a Python function / class definition looks like when displaying diffs. Turns out it ships with a giant ball of regular expressions covering 22 different languages, in github.com/git/git/blob/m… Screenshot of a git diff where it highlighted the Python claScreenshot of the python regular expressions from userdiff.c
Here's where they added support for Python "async def" functions seven months ago. The way to conquer terrifying regular expressions is with neat, comprehensive unit tests: github.com/git/git/commit…
I like that those unit tests are accompanied by a README explaining how to write them: github.com/git/git/blob/0…
Read 5 tweets
27 May 20
Hiring process for @AdHocTeam is really interesting: pick some assignments from homework.adhoc.team/assignments/ (depending on your role), your submissions are then anonymously reviewed by members of the team
The assignments are extremely well designed - here's one for testing JavaScript/API knowledge homework.adhoc.team/fetch/ - and here's one testing SRE skills building Docker containers and configuring nginx homework.adhoc.team/containerize/
They wrote about why they do this here: it's to allow them to hire for their remote-first team without needing to conduct in-person technical interviews adhocteam.us/2018/02/26/why…
Read 3 tweets
3 May 20
@zzxiv @andrewparker @datasetteproj That was one of my first use-cases for twitter-to-sqlite! I have it running on a cron every ten minutes that fetches the most recent 50 tweets that I've favourited

*/10 * * * * run-one twitter-to-sqlite favorites /home/simon/twitter.db -a /home/simon/auth.json --stop_after=50
@zzxiv @andrewparker @datasetteproj Then I can use Datasette to search my favorites - I have 23,993 favorited tweets in the index now
@zzxiv @andrewparker @datasetteproj I'm not yet fetching the content of the linked articles, but I've thought about doing that - I could have a separate DB table with URLs and article contents which is also setup for full-text search pretty easily
Read 5 tweets
16 Apr 20
SQL is a better API language than GraphQL. Convince me otherwise!
To counter some obvious arguments...

You don't have to expose your entire schema, instead expose carefully designed SQL views (so you can refactor your tables without breaking your API)

Read-only, obviously!

Use time limits to cut off expensive queries (GraphQL needs this too)
(I haven't quite convinced myself yet that SQL is a great language for update APIs, but I'm working on it)
Read 13 tweets
25 Feb 20
It's crazy that providing URLs that can be used to link to content is somehow an advanced technique in this weird SPA-infested world we live in these days
Like the ability to link to things is a stretch goal now
We still get to make fun of help pages that tell you to "go to menu X, then option Y, then scroll to Z" rather than just linking to the thing they're talking about, right?
Read 7 tweets
27 Dec 19
What bothers me is that in Mobile Safari the Apple News app still hijacks clicks on Atom/RSS feeds - so if you click a feed icon you'll be bounced to the News app, which will then display an error message.

I don't think there's a workaround for this. Atom links just look broken.
Here's a video demonstration I made of this a while ago:
Read 2 tweets
9 Jun 19
I released a fun new plugin for datasette: datasette-bplist, which adds tooling for handling values in SQLite databases that are data structures encoded using Apple’s binary plist format. github.com/simonw/dataset…
Why would you want this? Because it turns out the ~/Library/ folder on any OS X laptop is positively crammed with SQLite databases that embed binary-plist data in some of their tables.

Combine bplist_to_json() with json_extract() to pull out all kinds of interesting things...
My favourite example is the RKMaster_dataNote table tucked away in
~/Pictures/Photos\ Library.photoslibrary/database/photos.db - which contains EXIF data from every photo you've ever shared with Apple Photos. Here's a before and after shot showing what the plugin reveals
Read 6 tweets
16 May 19
TIL about "origin trials" - a neat way of trialing experimental implementations of new potential web standards in an opt-in manner, intended to avoid the chaos caused when experiments prematurely become de-facto standards github.com/GoogleChrome/O…
Here's a live list of currently active trials in Chrome developers.chrome.com/origintrials/#…
The Motivation and Explainer document for origin trials is fascinating github.com/GoogleChrome/O…
Read 6 tweets
9 May 19
Slides from my Datasette presentation at #csvconf (@CSVConference) this afternoon speakerdeck.com/simon/datasette
Here's the (super-ugly) code I used to scrape the CSVConf schedule and create a database out of it nbviewer.jupyter.org/gist/simonw/1d…
... which I deployed to Google Cloud Run, allowing for this delightful SQL query to show now-and-next information csvconf-j7hipcg4aq-uc.a.run.app/csvconf-5ae783…
Read 7 tweets
26 Dec 18
There is so much great #SpiderVerse behind the scenes content on Twitter if you start following some of the artists who created the film
Read 28 tweets
20 Dec 18
This is one of the main benefits I see of a "service" abstraction - be it a networked microservice or just a class that encapsulates some core business logic. A lightweight layer for reading and writing the DB.
A big reason this is useful is that in any large system those updates are likely to be denormalized in some way - a write to the DB may also need to also invalidate a cache or refresh a separate search index. So all updates should share a codepath.
And for reads in a large system speed is crucial. It's much easier to avoid pathological query performance if reads are all made through some kind of getPostsByIDs method that knows how to efficiently batch queries and uses optimizations like Django's select_ and prefetch_related
Read 4 tweets
8 Dec 18
In important London news: 5 (five!) of the 43 surviving Imperial Fabergé Eggs are on display at two separate temporary exhibitions right now! You should totally go and see them.
The Science Museum's exhibition "The Last Tsar: Blood and Revolution" runs until 24th March 2019 and includes the Imperial Steel Easter Egg (one of my favourite eggs) and the Imperial Red Cross Fabergé Easter Egg
And across town at the Queen's Gallery the Royal Collection exhibition "Russia, Royalty & the Romanovs" has all three of the Queen's Fabergé eggs on public display! royalcollection.org.uk/collection/the…
Read 3 tweets
1 Dec 18
Bit of a rant coming I'm afraid.

One of my least favourite anti-patterns are "enterprise" marketing websites that fail completely to answer the question "what does this thing do?"

Today's case in point: @KnativeProject

cloud.google.com/knative/
"Kubernetes-based platform to build, deploy, and manage modern serverless workloads."

OK, we are off to a reasonable start. So what does it actually do?
... OK, so what does it do?
Read 13 tweets
28 Jul 18
I wrote about documentation unit tests: a technique I've been exploring recently to use automated tests to ensure documentation exists for the key concepts contained within a codebase: simonwillison.net/2018/Jul/28/do…
The key idea is to use introspection to figure out which concepts exist within the code (for Datasette I'm listing plugin hooks, view classes and config options) and then use regular expressions to check for matching headings or labels within the docs
They're also a fun way of exploring some of the advanced features of @pytestdotorg - in my article I demonstrate pytest fixtures,
parametrized tests and how the xfail decorator can be used to ship the tests without breaking the build even when some docs have not been added yet
Read 6 tweets
21 Feb 18
Ray Bandar (1927-2017) was an artist, scientist, naturalist and an incredibly prolific collector of bones. His collection is in the process of moving to the California Academy of Sciences but Natalie managed to land us a private tour lead by his great nephew.
The collection is truly awe-inspiring, and a testament to an extraordinary life lived following a very particular passion.
All of this was in the basement of Ray's unassuming-looking house in San Francisco - 6,000+ skulls collected over a period of more than 50 years.
Read 8 tweets