Linus Profile picture
Jul 8, 2021 8 tweets 4 min read Read on X
NEW PROJECT — I made a "personal search engine" that lets me search all my blogs, tweets, journals, notes, contacts, & more at once 🚀

It's called Monocle, and features a full text search system written in Ink 👇

GitHub ⌨️ github.com/thesephist/mon…
Demo 🔍 monocle.surge.sh Image
One of my goals for this project was to learn about full text search systems, and how a basic FTS engine worked. So I wrote a FTS engine in Ink.

The project's readme goes into a little detail about how each step works, and how it all fits together.

📖 github.com/thesephist/mon… Image
The more I've been using it (since Saturday, when I had an MVP), the more I realize that this kind of a tool is probably my best shot at building a Memex, a system that knows about and lets me search through my entire landscape of knowledge — theatlantic.com/magazine/archi… Image
I've probably performed ~100 searches for various names, ideas, memories, blogs, and other random things in the last week, and the most interesting thing is how searching for one thing helps me stumble into some unexpected insight or memory from my past. Creative randomness. Image
Lastly, great search and recall is a centerpiece of the "incremental note-taking" concept I discussed last week — monocle.surge.sh/?q=incremental…

Monocle is a system that doesn't need me to take notes; it gathers knowledge by looking through my existing digital footprint. Image
I've spent a bunch of time on this this weekend so probably going to take a small break, but hopefully in the coming weeks and months I'll add a few more data sources to my search index:

- Browser history, YouTube watch history
- Reading list from Pocket
- Email (maybe?)
Lastly, a question I'm definitely expecting is "can I run this on my own data?"

Uhh... .probably not right now? The system is pretty custom-built for my setup. But if I like it, I might make a version that's open for other people to try ✌️
Wrote up some more thoughts in a blog :)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Linus

Linus Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @thesephist

Mar 1
Hypothesis: information work is overwhelmingly bottlenecked on availability of high-signal context more than by correct inference over the context. If right, implies higher ROI-per-flop of context building over pure logical inference.

h/t @anandnk24
Also, virtually all of the valuable context is in the tail of the information distribution. h/t @paraga
@anandnk24 wow this blew up. i dont have a soundcloud but go check out @PatronusAI to make sure your AI systems are behaving correctly and go make some ai apps on @ValDotTown
Read 4 tweets
Aug 11, 2023
had a chance last night to meet with some of the best minds in AI to discuss the most pressing challenge facing society today:

✨how to afford attending the ERAS TOUR ✨

after much discussion, we've arrived at a breakthrough, what we've termed the "Taylor Swift Scaling Laws" 👇 Image
the Taylor Swift Scaling Laws (TS2L) take inspiration from Scaling Laws for transformer-based LLMs, and apply the same log-log regression methodology to model and understand components of Taylor's ticket prices.

dare I say, we may have found something equally impactful
some highlights from the paper, which is a joint (overnight) work between me, @jtvhk, Ashish, and Niki (+ GPT4, thanks Notion for the OpenAI credit)

That last sentence == banger.
Image
Image
Read 5 tweets
Feb 25, 2023
I built a personal chatbot from my personal corpus[1] a couple weeks ago on fully open-source LMs. On a whim I gave it iMessage.

Didn't expect the iMessage bit to matter, but it made a huge difference in how it feels to interact. Much more natural.

[1] thesephist.com/posts/monocle/ ImageImageImage
Full write up hopefully coming soon, but I'm using cosmo-xl for text generation with my own prompt, retrieving from an in memory vector DB with sentence-transformers embeddings, and using @sendbluedotco for iMessage.
@sendbluedotco 🤷‍♂️🤔 Image
Read 4 tweets
Nov 16, 2022
Small rant about LLMs and how I see them being put, rather thoughtlessly IMO, into productivity tools. 📄

TL;DR — Most knowledge work isn't a text-generation task, and your product shouldn't ship an implementation detail of LLMs as the end-user interface

stream.thesephist.com/updates/166861…
The fact that LLMs generate text is not the point. LLMs are cheap, infinitely scalable black boxes to soft human-like reasoning. That's the headline! The text I/O mode is just the API to this reasoning genie. It's a side effect of the training paradigm.
A vanishingly small slice of knowledge work has the shape of text-in-text-out (copywriting/Jasper). The real alpha is not in generating text, but in using this new capability and wrapping it into jobs that have other shapes.
Read 11 tweets
Nov 2, 2022
NEW DEMO!

Exploring the "length" dimension in the latent space of a language model ✨

By scrubbing up/down across the text, I'm moving this sentence up and down a direction in the embedding space corresponding to text length — producing summaries w/ precise length control (1/n)
Length is one of many attributes that I can control by traversing the latent space of this model — others include style, emotional tone, context...

Here's "adding positivity" 🌈

It's a continuous space, so attributes can all be mixed/dialed more precisely than by rote prompting
More to follow soon on how it works, but in brief:

- Built on a custom LM arch based on T5 checkpoints
- An "attribute direction" is found from unpaired examples of texts w/ and w/o that trait
- Simple vector math in latent space + decoding from the latent gets you this effect.
Read 10 tweets
Sep 14, 2022
Good tools admit virtuosity — they have low floors and high ceilings, and are open to beginners but support mastery, so that experts can deftly close the gap between their taste and their craft.

Prompt engineering does not admit virtuosity. We need something better.
Tools like Logic, Photoshop, or even the venerable paintbrush can be *mastered*, so that there is no ceiling imposed by the tool for how good you can get to going from image in your mind -> output. Masters of these tools can wield tools as extensions of themselves.
For this to work, the tool has to present a coherent set of abstractions, and predictable behavior about how composing them will change the user's output. Prompt engineering is not predictable, and there are no coherent abstractions. It's all just gut feelings and copy-paste.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(