I particularly like this one - it's been running for three years scraping bug bounty target domains from sites like @Hacker0x01github.com/arkadiyt/bount…
I just noticed you can sort the git-scrapers topic on GitHub by "recently updated", which gives a live view of the scrapers people have built - there's a few new ones on there since I published my article this morning! github.com/topics/git-scr…
Handy tip from Hacker News: git show --color-words gives a more detailed word-level diff. Here's that run against the latest diff recorded by github.com/simonw/ca-fire…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Notes, links and notebook from my #JupyterCon2020 session on Datasette and sqlite-utils this morning: gist.github.com/simonw/656c21b… - I live-coded a SQLite database containing earthquakes from last week using data from USGS
Here's a Datasette instance showing the database of earthquakes I built in that notebook - facet by type shows that there were 17 ice quakes, 1 mine collapse and 1 chemical explosion in the past week! jupytercon-earthquakes-demo.vercel.app/earthquakes/qu…
Built a tool to load Evernote exported notes (ENEX files) into a SQLite database - so now I can load them into Datasette and see all of my notes on a map! github.com/dogsheep/evern…
ENEX is a slightly tricky format - each note is CDATA XML embedded in a larger XML document, plus base64 attachments (usually images).
All of my notes exported at once result in an XML file that's 2.6GB!
One neat detail is that the output of Evernote's cloud OCR system (run automatically against every image embedded in a note) is available in the XML. evernote-to-sqlite extracts out these OCR strings and uses them to populate a SQLite full-text search table.
I really like this post by @mc on Django Gotchas from the next day for more historical context - you had to install it by symlinking to site-packages! postneo.com/2005/07/18/dja…
I got curious about how git knows what a Python function / class definition looks like when displaying diffs. Turns out it ships with a giant ball of regular expressions covering 22 different languages, in github.com/git/git/blob/m…
Here's where they added support for Python "async def" functions seven months ago. The way to conquer terrifying regular expressions is with neat, comprehensive unit tests: github.com/git/git/commit…
I like that those unit tests are accompanied by a README explaining how to write them: github.com/git/git/blob/0…
@zzxiv@andrewparker@datasetteproj That was one of my first use-cases for twitter-to-sqlite! I have it running on a cron every ten minutes that fetches the most recent 50 tweets that I've favourited
@zzxiv@andrewparker@datasetteproj I'm not yet fetching the content of the linked articles, but I've thought about doing that - I could have a separate DB table with URLs and article contents which is also setup for full-text search pretty easily