Built a tool to load Evernote exported notes (ENEX files) into a SQLite database - so now I can load them into Datasette and see all of my notes on a map! github.com/dogsheep/evern… A screenshot of my notes on...
ENEX is a slightly tricky format - each note is CDATA XML embedded in a larger XML document, plus base64 attachments (usually images).

All of my notes exported at once result in an XML file that's 2.6GB!
One neat detail is that the output of Evernote's cloud OCR system (run automatically against every image embedded in a note) is available in the XML. evernote-to-sqlite extracts out these OCR strings and uses them to populate a SQLite full-text search table. An image containing the tex...A <recoIndex> XML document ...
I just updated dogsheep.github.io with the latest Dogsheep tools - including three new ones that weren't written by my (I'd fallen behind on reviewing the pull requests for that repo) Tools by other developers  ...
While building evernote-to-sqlite I dusted off a 2009 project I built for analyzing an XML file and outputting a summary of the elements contained within - that's published to PyPI now as xml-analyser github.com/simonw/xml-ana… Screenshot showing the outp...
(As a resident of the USA, these days I usually spell analyzer with a Z - but since I started this project in the UK I decided to keep the S. I did register pypi.org/project/xml-an… on PyPI to guard against typo-squatting though!)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Simon Willison

Simon Willison Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @simonw

12 Oct
Notes, links and notebook from my #JupyterCon2020 session on Datasette and sqlite-utils this morning: gist.github.com/simonw/656c21b… - I live-coded a SQLite database containing earthquakes from last week using data from USGS
Here's a Datasette instance showing the database of earthquakes I built in that notebook - facet by type shows that there were 17 ice quakes, 1 mine collapse and 1 chemical explosion in the past week! jupytercon-earthquakes-demo.vercel.app/earthquakes/qu… Screenshot of Datasette dis...
Looks like this is the mine collapse: triblive.com/local/westmore…
Read 4 tweets
9 Oct
Git scraping: track changes over time by scraping to a Git repository
simonwillison.net/2020/Oct/9/git…
I realized I've been using the term "git scraping" for a while without publishing a clear explanation of what I mean by the term

Git scraping = writing scrapers that record their results in a Git repository in order to track changes to the scraped resource over time
The Hacker News thread has some great examples of people who've discovered the same pattern news.ycombinator.com/item?id=247329…

I particularly like this one - it's been running for three years scraping bug bounty target domains from sites like @Hacker0x01 github.com/arkadiyt/bount…
Read 5 tweets
17 Jul
15 years ago today on my blog: Introducing Django simonwillison.net/2005/Jul/17/dj…
Adrian and I developed Django to allow us to create reasonab
I really like this post by @mc on Django Gotchas from the next day for more historical context - you had to install it by symlinking to site-packages! postneo.com/2005/07/18/dja…
Read 7 tweets
9 Jun
I got curious about how git knows what a Python function / class definition looks like when displaying diffs. Turns out it ships with a giant ball of regular expressions covering 22 different languages, in github.com/git/git/blob/m… Screenshot of a git diff where it highlighted the Python claScreenshot of the python regular expressions from userdiff.c
Here's where they added support for Python "async def" functions seven months ago. The way to conquer terrifying regular expressions is with neat, comprehensive unit tests: github.com/git/git/commit…
I like that those unit tests are accompanied by a README explaining how to write them: github.com/git/git/blob/0…
Read 5 tweets
3 May
@zzxiv @andrewparker @datasetteproj That was one of my first use-cases for twitter-to-sqlite! I have it running on a cron every ten minutes that fetches the most recent 50 tweets that I've favourited

*/10 * * * * run-one twitter-to-sqlite favorites /home/simon/twitter.db -a /home/simon/auth.json --stop_after=50
@zzxiv @andrewparker @datasetteproj Then I can use Datasette to search my favorites - I have 23,993 favorited tweets in the index now
@zzxiv @andrewparker @datasetteproj I'm not yet fetching the content of the linked articles, but I've thought about doing that - I could have a separate DB table with URLs and article contents which is also setup for full-text search pretty easily
Read 5 tweets
16 Apr
SQL is a better API language than GraphQL. Convince me otherwise!
To counter some obvious arguments...

You don't have to expose your entire schema, instead expose carefully designed SQL views (so you can refactor your tables without breaking your API)

Read-only, obviously!

Use time limits to cut off expensive queries (GraphQL needs this too)
(I haven't quite convinced myself yet that SQL is a great language for update APIs, but I'm working on it)
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!