Notes, links and notebook from my #JupyterCon2020 session on Datasette and sqlite-utils this morning: gist.github.com/simonw/656c21b… - I live-coded a SQLite database containing earthquakes from last week using data from USGS
Here's a Datasette instance showing the database of earthquakes I built in that notebook - facet by type shows that there were 17 ice quakes, 1 mine collapse and 1 chemical explosion in the past week! jupytercon-earthquakes-demo.vercel.app/earthquakes/qu…
sqlite-utils lets you create a full-featured SQLite database in as few lines of code as possible, which makes it a great fit for Jupyter. This code fetches GeoJSON, reshapes it into a list of dicts, inserts them into a new table (schema is derived from the data) and enables FTS
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Built a tool to load Evernote exported notes (ENEX files) into a SQLite database - so now I can load them into Datasette and see all of my notes on a map! github.com/dogsheep/evern…
ENEX is a slightly tricky format - each note is CDATA XML embedded in a larger XML document, plus base64 attachments (usually images).
All of my notes exported at once result in an XML file that's 2.6GB!
One neat detail is that the output of Evernote's cloud OCR system (run automatically against every image embedded in a note) is available in the XML. evernote-to-sqlite extracts out these OCR strings and uses them to populate a SQLite full-text search table.
I particularly like this one - it's been running for three years scraping bug bounty target domains from sites like @Hacker0x01github.com/arkadiyt/bount…
I really like this post by @mc on Django Gotchas from the next day for more historical context - you had to install it by symlinking to site-packages! postneo.com/2005/07/18/dja…
I got curious about how git knows what a Python function / class definition looks like when displaying diffs. Turns out it ships with a giant ball of regular expressions covering 22 different languages, in github.com/git/git/blob/m…
Here's where they added support for Python "async def" functions seven months ago. The way to conquer terrifying regular expressions is with neat, comprehensive unit tests: github.com/git/git/commit…
I like that those unit tests are accompanied by a README explaining how to write them: github.com/git/git/blob/0…
@zzxiv@andrewparker@datasetteproj That was one of my first use-cases for twitter-to-sqlite! I have it running on a cron every ten minutes that fetches the most recent 50 tweets that I've favourited
@zzxiv@andrewparker@datasetteproj I'm not yet fetching the content of the linked articles, but I've thought about doing that - I could have a separate DB table with URLs and article contents which is also setup for full-text search pretty easily