Built a tool to load Evernote exported notes (ENEX files) into a SQLite database - so now I can load them into Datasette and see all of my notes on a map! github.com/dogsheep/evern…
ENEX is a slightly tricky format - each note is CDATA XML embedded in a larger XML document, plus base64 attachments (usually images).
All of my notes exported at once result in an XML file that's 2.6GB!
One neat detail is that the output of Evernote's cloud OCR system (run automatically against every image embedded in a note) is available in the XML. evernote-to-sqlite extracts out these OCR strings and uses them to populate a SQLite full-text search table.
I just updated dogsheep.github.io with the latest Dogsheep tools - including three new ones that weren't written by my (I'd fallen behind on reviewing the pull requests for that repo)
While building evernote-to-sqlite I dusted off a 2009 project I built for analyzing an XML file and outputting a summary of the elements contained within - that's published to PyPI now as xml-analyser github.com/simonw/xml-ana…
(As a resident of the USA, these days I usually spell analyzer with a Z - but since I started this project in the UK I decided to keep the S. I did register pypi.org/project/xml-an… on PyPI to guard against typo-squatting though!)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Notes, links and notebook from my #JupyterCon2020 session on Datasette and sqlite-utils this morning: gist.github.com/simonw/656c21b… - I live-coded a SQLite database containing earthquakes from last week using data from USGS
Here's a Datasette instance showing the database of earthquakes I built in that notebook - facet by type shows that there were 17 ice quakes, 1 mine collapse and 1 chemical explosion in the past week! jupytercon-earthquakes-demo.vercel.app/earthquakes/qu…
I particularly like this one - it's been running for three years scraping bug bounty target domains from sites like @Hacker0x01github.com/arkadiyt/bount…
I really like this post by @mc on Django Gotchas from the next day for more historical context - you had to install it by symlinking to site-packages! postneo.com/2005/07/18/dja…
I got curious about how git knows what a Python function / class definition looks like when displaying diffs. Turns out it ships with a giant ball of regular expressions covering 22 different languages, in github.com/git/git/blob/m…
Here's where they added support for Python "async def" functions seven months ago. The way to conquer terrifying regular expressions is with neat, comprehensive unit tests: github.com/git/git/commit…
I like that those unit tests are accompanied by a README explaining how to write them: github.com/git/git/blob/0…
@zzxiv@andrewparker@datasetteproj That was one of my first use-cases for twitter-to-sqlite! I have it running on a cron every ten minutes that fetches the most recent 50 tweets that I've favourited
@zzxiv@andrewparker@datasetteproj I'm not yet fetching the content of the linked articles, but I've thought about doing that - I could have a separate DB table with URLs and article contents which is also setup for full-text search pretty easily