An unintuitive secret of reading books on computers: reading PDFs with original typesetting is much better than reading ebooks, which treat text like a 4chan shitposter and have impoverished reading software.

But… where to get the PDFs?! A survey & suggestions for future work:
Google Play:
👍 ~smooth workflow; clean pages
👎 PDFs lack text layer, so they're not searchable or selectable; only recent books available in PDF

archive.org:
👍 has many older books Play lacks; includes OCR'd text layer
👎 OCR errors; photo noise; clunkier workflow
Z-Library:
👍 occasionally has clean PDFs for books which others lack
👎 PDFs are often EPUB->PDF conversions (the worst!); more illegal
One fun project idea: maybe you could improve upon the poor text layers in Play / archive.org's PDFs by building a tool which combines EPUBs and PDFs by aligning the EPUB's original text onto the PDF pages via OCR.
Maybe you could improve the EPUB reading experience by extracting text block layout parameters from the PDFs through computer vision: ie. try to estimate the text block width/height, line height, and font size in the original typesetting. Similar technique could map page numbers.
Related: while e-book reading software are truly impoverished, PDF software is also almost universally unimaginative and unserious for the task of reading. Would love to see more work there…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andy Matuschak

Andy Matuschak Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @andy_matuschak

18 Oct
I've noticed that consciousness recedes when I'm deep in a coding phase, many back-to-back days in flow. My mind narrows to tunnel-vision, fixated on the software and its issues. My sense of self shrinks; non-code ideas cease to arise; I get less curious; writing yields little.
It's an odd feeling: flow is experientially satisfying, but the creeping self-abnegation is worrying. I also notice it takes quite a while to "reset" from this phase, to start hearing myself think again, to feel like less of an automaton.
I don't experience this feeling when I spend many days back-to-back in flow doing other work: developing an idea, writing, designing. I wonder if it's bc those activities are more creative, involve more reflective thought. Or maybe it's that I'm worse at them—so flow's less deep!
Read 5 tweets
17 Oct
I enjoyed @eriktorenberg's observation that earned authentic respect is an underrated catalyst for the way one's skills, capital, brand, and network can reinforce each other. eriktorenberg.substack.com/p/see-your-car…
I've often misunderstood this b/c respect is not very illegible, and the scale is unlabeled.

e.g. It feels like success when others reliably accept your coffee invites. But that's actually not very high on the scale: better if people proactively think of you in relevant moments.
Part of why this is confusing is that respect *feels* more legible than it is. Proxies like $, media appearances, followers, citations, etc *seem* like they correspond to respect, but very often they don't! Easy to accidentally internalize false lessons about what earns respect.
Read 4 tweets
25 Sep
Woke up to a great paradoxical notion from @nsbarr: sometimes the main benefit of non-linear authoring (whiteboard, hypertext, Muse) is actually linear thought! These envs offer a “release valve” for tangential stuff so you can focus on your “main” idea. notes.andymatuschak.org/z3iT7pPmhbY8Wt…
@nsbarr One thing I really like about this is that it subverts the usual narrative around e.g. densely-linked note systems: maybe the value of non-linear writing isn’t (just) in the future value of the embedded links to you/readers—but rather in helping you focus in the moment.
@nsbarr In this framing, the tangential stuff and non-linear associations are ephemeral chaff, not durable future working material!

Too strong as stated, I think (see notes.andymatuschak.org/z2HUE4ABbQjUNj… for args in favor of future value of links), but a useful angle, I think.
Read 4 tweets
22 Sep
Alan Kay suggests that good inventors are like Michelangelo, both imagining the ceiling of the Sistine Chapel—and also spending years on their back painting it! Part visionary, part obsessive craftsperson.

I wonder about auteurs in film—hundreds of staff doing detail work!
Maybe one principle is that it’s possible to (partially) delegate to someone else who is themselves Michelangelo-like in that way.

Like: maybe Wes Anderson’s set dressers are just as visionary and obsessive as he is, so he can let them do some of the “painting”?
Likewise in games: maybe an auteur-like direct can “outsource” only to a level designer who will themselves bring auteur-like sensibilities—and not to a “technician”? @Jonathan_Blow suggests experiences along these lines in his comments about The Witness’s team.
Read 4 tweets
15 Sep
Nerd-puzzle: how might I allow sibling same-origin iframes to communicate, given…
- parent is cross-origin
- can’t execute JS on parent
- no sessionStorage, localStorage, cookies, or IDB access
- with enough security to share auth tokens?
The best I can come up with is to have each iframe open a WebSocket to a server which can coordinate, but I don’t see how to guard again an attacker posing as a sibling iframe and receiving secure data.
A concrete instantiation of the problem: imagine a page has three YouTube embeds and these security constraints. The user signs into YouTube via UI in one embed. You’d like the other embeds to also become signed in.
Read 6 tweets
13 Sep
This is a thoughtful new review of the “interactive explanation” milieu: distill.pub/2020/communica…. I’m a friend of the format—I’ve written articles like this myself—but I worry it’s trapped in a limited framing, selling short the potential of computational representations.
Here's my crux: The Cartesian plane was not invented to disseminate mathematics, or to make math more engaging. It was invented to help *do math*.

The same point can be made about John Snow’s cholera map, Feynman QED diagrams, and our other most powerful representations.
If you create an interactive representation which amplifies original research, then you can often *also* use it for dissemination, journalism, etc. But if your design goal is “communicating to others,” it’s very unlikely that the representation will amplify original research.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!