Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @simonw

Simon Willison

@simonw

12 Nov

@datasetteproj

I presented a 90 minute workshop on Git scraping and how to explore the resulting data using @datasetteproj at #CodaBr21 (by @EscolaDeDados) today

Here are the exercises from the workshop, plus additional screenshots and notes that I added afterwards

docs.google.com/document/d/1TC…

@datasetteproj

@datasetteproj @EscolaDeDados To save attendees from having to get a working Python environment setup on their laptop, I instead encouraged them to use a free @gitpod account (gitpod.io) - I demonstrated each exercise in GitPod too

Cloud-based development environments are SO GOOD for tutorials

(I had planned to use GitHub Codespaces for this, but then realized that those are not freely available to non-paid users outside of the beta program yet)

Read 6 tweets

Simon Willison

@simonw

21 Aug

Here's a fun challenge: given an array of datetimes, what's the best way to plot those on a frequency graph over time?

They might all be on the same day, or they might be spread out over several years - so the challenge is automatically picking the most interesting bucket size

Looks like d3.bin().thresholds() is the answer I'm looking for observablehq.com/@d3/histogram

Yup, this works perfectly: observablehq.com/@simonw/my-twe…

Read 6 tweets

Simon Willison

@simonw

9 Aug

Is there a reliable way to tell search engine crawlers that a site hasn't been updated in X days so they don't need to re-crawl it?

Do they tend to believe the <lastmod> element in sitemap.xml ? And can I set that to apply to the whole site, not just an individual page?

https://twitter.com/simonw/status/1424801199753089035

Asking because tailing logs shows a vast amount of crawler traffic to Datasette instances that haven't seen any data changes in over a year - I may have to robots.txt block crawlers from them to save in costs, but I'd rather tell them "no point in crawling, nothing has changed"

https://twitter.com/simonw/status/1424801199753089035

Datasette currently has a plugin for configuring robots.txt, but I'm beginning to think it should be part of core and crawlers should be blocked by default - having people explicitly opt-in to having their sites crawled and indexed feels a lot safer datasette.io/plugins/datase…

Read 4 tweets

Simon Willison

@simonw

28 Jul

Finally published my article describing the Baked Data architectural pattern, which I define as "bundling a read-only copy of your data alongside the code for your application, as part of the same deployment"
simonwillison.net/2021/Jul/28/ba…

@vercel

I've been exploring this pattern for a few years now. It lets you publish sites to serverless hosts such as @vercel or @googlecloud Cloud Run that serve content from a read-only database (usually SQLite) - so they scale horizontally and can reboot if something breaks

It effectively gives you many of the benefits of static site publishing - cheap to host, hard to break, easy to scale - while still supporting server-side features such as search engines, generated Atom feeds and suchlike

Read 11 tweets

Simon Willison

@simonw

26 Jul

Anyone know of examples of SaaS apps that deploy stuff to your AWS account (lambdas, S3 buckets etc) using IAM credentials that you grant to the SaaS app? Is this a pattern anywhere?

I've seen examples of apps that will write your data to an S3 bucket that you own - various logging tools do this

Here's Fastly's documentation on setting that up: docs.fastly.com/en/guides/log-…

Read 8 tweets

Simon Willison

@simonw

11 Jun

A silly thing that puts me off using Docker Compose a bit: I frequently have other projects running on various 8000+ ports, and I don't like having to shut those down before running "docker-compose up"

Is there a Docker Compose pattern for making the ports runtime-configurable?

I'd love to be able to run something like this:

cd someproject
export WEB_PORT=3003
docker-compose up

And have the project's server run on localist:3003 without any risk of clashing with various other Docker Compose AND non-Docker-Compose projects I might be running

https://twitter.com/coates/status/1403383999096143879

This looks like exactly what I want!

https://twitter.com/coates/status/1403383999096143879

Read 6 tweets

Share this page!

Simon Willison

Try unrolling a thread yourself!

More from @simonw

Simon Willison

Simon Willison

Simon Willison

Simon Willison

Simon Willison

Simon Willison

Did Thread Reader help you today?

Like this author's thread?