Tweet

Simon Willison

Follow @simonw

28 Jul, 11 tweets, 4 min read

Finally published my article describing the Baked Data architectural pattern, which I define as "bundling a read-only copy of your data alongside the code for your application, as part of the same deployment"
simonwillison.net/2021/Jul/28/ba…

@vercel

I've been exploring this pattern for a few years now. It lets you publish sites to serverless hosts such as @vercel or @googlecloud Cloud Run that serve content from a read-only database (usually SQLite) - so they scale horizontally and can reboot if something breaks

It effectively gives you many of the benefits of static site publishing - cheap to host, hard to break, easy to scale - while still supporting server-side features such as search engines, generated Atom feeds and suchlike

In the article I describe how datasette.io uses this pattern - it's pretty complex, with a build script that imports data from a bunch of different sources in order to provide search, download statistics and more

@pmclanahan

The biggest example of this pattern I've seen in the wild is mozilla.org - which distributes a copy of a SQLite DB containing the site content to all of the application servers. @pmclanahan wrote a great article explaining how this works here: mozilla.github.io/meao/2018/03/2…

If you're feeling nosy you can grab a copy of their SQLite DB from their healthcheck page at mozilla.org/healthz-cron/ and explore it using Datasette

https://twitter.com/juliansimioni/status/1420485305501638657

Whoa, here's an example of this pattern at a much bigger scale!

https://twitter.com/juliansimioni/status/1420485305501638657

Julian's comment here really captures why this technique is so useful: it lets you run your code statelessly, which dramatically reduces both operational complexity and the chances that something will go wrong

Another, less obvious example of Baked Data that I forgot to include in the article: datasette-ripgrep is a regular expression search engine over source code that shells out to the amazing ripgrep - you bake the source code in when you deploy it, e.g. ripgrep.datasette.io/-/ripgrep?patt…

https://twitter.com/alexjreid/status/1420625467384737797

Alex got Clickhouse running against some read-only baked data on Cloud Run last year - really thorough write-up

https://twitter.com/alexjreid/status/1420625467384737797

@flydotio

Another interesting angle on Baked Data is that it's very compatible with running your server-side code at edge CDN locations - you can geographically distribute copies of your data alongside your application code. @flydotio is one neat tool for building this.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @simonw

Simon Willison

@simonw

26 Jul

Anyone know of examples of SaaS apps that deploy stuff to your AWS account (lambdas, S3 buckets etc) using IAM credentials that you grant to the SaaS app? Is this a pattern anywhere?

I've seen examples of apps that will write your data to an S3 bucket that you own - various logging tools do this

Here's Fastly's documentation on setting that up: docs.fastly.com/en/guides/log-…

Read 8 tweets

Simon Willison

@simonw

11 Jun

A silly thing that puts me off using Docker Compose a bit: I frequently have other projects running on various 8000+ ports, and I don't like having to shut those down before running "docker-compose up"

Is there a Docker Compose pattern for making the ports runtime-configurable?

I'd love to be able to run something like this:

cd someproject
export WEB_PORT=3003
docker-compose up

And have the project's server run on localist:3003 without any risk of clashing with various other Docker Compose AND non-Docker-Compose projects I might be running

https://twitter.com/coates/status/1403383999096143879

This looks like exactly what I want!

https://twitter.com/coates/status/1403383999096143879

Read 6 tweets

Simon Willison

@simonw

10 May

Announcing Django SQL Dashboard, now out of alpha and ready for people to try out on their own Django+PostgreSQL projects: simonwillison.net/2021/May/10/dj…

The key idea here is to bring some of the most valuable features of Datasette to any Django+PostgreSQL project

You can execute read-only SQL queries interactively, bookmark and share the results, and write queries that produce bar charts, progress bars and even word clouds too

I recorded a three minute video demo which shows the tool in action

Read 9 tweets

Simon Willison

@simonw

10 May

Out of interest: if you have a blob of JSON on your clipboard and you want to see a pretty-printed version of it, what's your fastest way to do that?

I hit Shift+Command+N in VSCode to get a new window, paste it in there, then hit Shift+Command+P to get the command palette, type JS and select the JSON pretty print option - which I think I installed as an extension at some point

Other times I'll use "pbpaste | jq", occasionally I'll use ipython like so:

s = """<paste JSON here>"""
import json
print(json.dumps(json.loads(s), indent=2))

Read 6 tweets

Simon Willison

@simonw

3 May

What are the options for "serverless" PostgreSQL like these days? My definition of serverless here is that you don't have to spend any money at all if you're not getting any DB traffic, and cost then scales up as the traffic and storage you are using increases

Aurora PostgreSQL is the most obvious option, though it's not clear to me if you have to keep at least one instance running for it or if it fully "scales to zero" for projects that aren't receiving any traffic at all

Consensus in replies seems to be that this doesn't actually exist yet - scale-to-zero for a relational database server like PostgreSQL is evidently a whole lot harder than scale-to-zero for a stateless web application server as seen with things like Google Cloud Run

Read 6 tweets

Simon Willison

@simonw

2 May

"Hosting SQLite databases on Github Pages" is absolutely brilliant: it adds a virtual filesystem to SQLite-compiled-to-WebAssembly in order to fetch pages from the database using HTTP range requests phiresky.github.io/blog/2021/host…

Check out this demo: I run the SQL query "select country_code, long_name from wdi_country order by rowid desc limit 100" and it fetches just 54.2KB of new data (across 49 small HTTP requests) to return 100 results - from a statically hosted database file that's 668.8MB!