Simon Willison Profile picture
Mar 10 23 tweets 10 min read
shot-scraper is my new tool for automating screenshots, primarily for documentation but with some devious scraping applications too
simonwillison.net/2022/Mar/10/sh…
It's built on top of @playwrightweb - shot-scraper provides a CLI tool for taking a screenshot of a page (or a portion of a page):

shot-scraper datasette.io -o datasette.png A long screenshot of the wh...
Or of a portion of a page, specified using a CSS selector

shot-scraper simonwillison.net \
-s '#bighead' -o bighead.png A screenshot of just the to...
You can also define a YAML file with a sequence of screenshots and run them all using:

shot-scraper multi shots.yml - output: example.com.png  ...
But the real fun comes when you combine the ability to snapshot a CSS selector with the ability to run extra JavaScript on the page

This annotated image (suitable for inclusion in a tutorial) is created by this YAML configuration - which drops in and positions a SVG pink arrow! Screenshot of a section of ...A really long YAML example,...
More details on shot-scraper in the GitHub repo here: github.com/simonw/shot-sc…
This is my first time using @playwrightweb for browser automation and it's /amazing/ - my entire implementation of shot-scraper is just 180 lines of Python! github.com/simonw/shot-sc…
I think my favourite thing about shot-scraper is how it works with GitHub Actions. This demo repo defines a couple of screenshot generation procedures in a YAML file, then runs them in GitHub Actions and writes the generated images back to the same repo github.com/simonw/shot-sc…
I really enjoy building tools that treat GitHub Actions as a platform - it's great knowing the software can deliver value running entirely within their stable, well documented infrastructure - protecting users from worrying about if it will work correctly on their own machines
shot-scraper now being discussed on Hacker News: news.ycombinator.com/item?id=306218…
Anyone tried this out yet? I'm really looking forward to seeing what people use it for!
Just released shot-scraper 0.4 with a new "shot-scraper accessibility URL" command, for scraping the accessibility tree of a page
Here's an example of an image diff it captured from owlsnearme.com
github.com/simonw/shot-sc…
Just made a tiny change to my owlsnearme.com shot-scraper demo - I added code to hide the "taken 5 hours ago" text, so that it wouldn't result in every scrape being a different image just because time had passed since the last scrape github.com/simonw/shot-sc…
First live example of something someone else built with shot-scraper, and it is golden!
shot-scraper 0.5 is out, with a new "shot-scraper pdf" command for grabbing PDFs of web pages

shot-scraper pdf datasette.io \
--landscape -o datasette-landscape.pdf

Release notes: github.com/simonw/shot-sc… A screenshot of a PDF gener...
shot-scraper 0.6 is out: github.com/simonw/shot-sc…

Three really neat new features:

- Authentication! You can auth with a website and reuse the saved cookies
- Option to interact with a page manually before taking the shot
- Partial shots can now encompass multiple CSS selectors A screenshot of the release...
And I used shot-scraper to take the screenshot of the release notes that I included in that tweet!

shot-scraper github.com/simonw/shot-sc… \
--selector '.Box-body' --width 1000 -o box.png
shot-scraper 0.7 adds --retina mode for double the image resolution, and the tool now defaults to writing files to disk - rather than dumping binary PNG/PDF data to your terminal if you forgot to give it a "-o screenshot.png" filename

github.com/simonw/shot-sc… 0.7      The shot-scraper s...
shot-scraper 0.8 can now take screenshots of files on local disk:

shot-scraper index.html -o index.png

The URL protocol is also no longer required:

shot-scraper datasette.io -o datasette.png

github.com/simonw/shot-sc…
To avoid confusion between these two new modes, in checks for the existence of a local file called "datasette.io" before assuming that the text entered is a URL that needs to have http:// prepended to it
I also upgraded the shot-scraper demo repository to use the new --retina mode from 0.7 when generating the annotated screenshot - diff illustrating that here: github.com/simonw/shot-sc… The diff shows that the new...

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Simon Willison

Simon Willison Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @simonw

Mar 14
Instantly create a GitHub repository to take screenshots of a web page - a write-up of my new shot-scraper-template GitHub repository template simonwillison.net/2022/Mar/14/sh…
Want to take and store a screenshot of a web page?

Go to github.com/simonw/shot-sc…, enter the URL of the page you want in the "description" field, pick a name for your new repo and click the button

There is no step two The form asks you for a rep...
I describe how this works in the blog post: your new repo will run a GitHub Action that creates a "shots.yml" file with the URL from the description, then installs and runs "shot-scraper multi shots.yml" to take the screenshot and write it back to the repo simonwillison.net/2022/Mar/14/sh…
Read 7 tweets
Mar 14
Released shot-scraper 0.9 with a very fun new feature: you can now use it to execute JavaScript against a web page and return the result to the terminal as JSON!
github.com/simonw/shot-sc…       New shot-scraper java...
Full documentation here: github.com/simonw/shot-sc…

You can scrape pages and return a JSON object with the extracted data:

shot-scraper javascript datasette.io "({
title: document.title,
tagline: document.querySelector('.tagline').innerText
})"
If a JavaScript exception occurs the exit status for the shot-scraper invocation will be 1, which means you can also now use shot-scraper to run basic tests as part of a CI workflow: - name: Test page title   r...
Read 8 tweets
Feb 2
Whoa. webvm.io runs a full Debian VM entirely in your browser via WebAssembly... and it ships with working Perl, Python, Ruby and Node.js!
It has gcc too! This works:

gcc -o helloworld examples/c/helloworld.c

And it looks like there's a virtual filesystem that stores state in your browser
Combine this with:

- JupyterLite jupyterlite.readthedocs.io/en/latest/_sta…
- SQLime sqlime.org

And it looks like one of the killer apps of WebAssembly is providing 100% safe and reliable teaching environments for people who are just getting started learning complex technologies
Read 7 tweets
Jan 12
Want to know the secret to blogging more often?

Lower your standards!

A post which you don't think is ready yet is a LOT better than a giant folder full of drafts that no-one ever gets to see

(Your readers won't ever know how good the thing you wanted to write would have been)
I like to apply this classic Reid Hoffman startup/product advice to my writing, because the alternative is basically never publishing anything at all
One of the biggest productivity improvements I ever made to my blogging was when I gave up on my desire to finish everything with a sparkling conclusion that ties together the whole post

Now I embrace abruptly ending when I've run out of things to say instead
Read 5 tweets
Jan 11
I've been solving so many documentation problems with @nedbat's cog tool recently - it's fantastic for keeping documentation automatically up-to-date, in Markdown or rST)

Here's a new page of sqlite-utils docs showing --help for every CLI command! sqlite-utils.datasette.io/en/latest/cli-… Partial screenshot of that ...
And here's how it works - I have a cog code block embedded in the .rst file which iterates through the commands and calls --help on each one, then writes the output to the page:
github.com/simonw/sqlite-… .. contents:: :local:  .. [...
Final trick: my GitHub Actions test.yml file calls "cog --check docs/*.rst" to confirm that the cog scripts have been run

If the test fails, I can run "cog -r docs/*.rst" to execute them, then commit the result. github.com/simonw/sqlite-…
Read 7 tweets
Jan 11
What’s new in sqlite-utils - annotated release notes for my SQLite Python utility library and CLI tool, v3.20 and v3.21 simonwillison.net/2022/Jan/11/sq…
A bunch of powerful new features in these releases.

The new --convert option to "sqlite-utils insert" lets you run a Python conversion function against data you are importing from JSON or CSV - and --lines lets you import raw lines of code (e.g. from log files) too
Combining the new --text option with --convert lets you load in a full unstructured/semi-structured file in one go and use a Python fragment to parse it into a list of dictionaries which then get inserted into a table
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(