Ivan Leo Profile picture
Jan 5, 2025 7 tweets 3 min read Read on X
Ever struggled to understand how users use your product?

I just built an open source implementation of Anthropic's internal clustering algorithm - CLIO.

With Gemini Flash, you can generate human readable labels which are clustered and grouped together to spot usage patterns.

Read more to find out how it works
We first generate summaries that redact PII of user conversations.

These are then embedded and clustered using a K-Means algorithm Image
We then take a cluster and sample contrastive examples from other clusters in order to generate a descriptive name and description for each individual cluster group Image
Image
Once that's done, we recursively merge clusters together to form higher level clusters that describe broad usage patterns without leaking user information. Image
Image
I've written up a blog post walking through the code in greater detail where I talk about

- Things I found interesting in the paper
- Implementation Details and examples
- Limitations of my approach and how you can adapt it

Read it here: ivanleo.com/blog/understan…
I've also released the code that I used along with a application that I built to generate the clusters using @answerdotai 's FastHTML package.

You can experiment with the hyper-parameters that I used to see if it gives better clusters

Code: github.com/ivanleomk/chat…
@answerdotai tq to Claude too for generating the promo tweet for this haha Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ivan Leo

Ivan Leo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ivanleomk

Dec 30, 2024
I spent the weekend playing around with @v0 and generated almost 80% of this entire UI just by prompting from scratch

Here's a quick thread of 3 things I took away from how to prompt v0 better
1. Use @v0 to quickly generate and evaluate ideas for your UI.

For instance, when it came to the dashboard I wanted to create, I got it to create the following mock-ups to see what fit the best with what I had in mind.

At this point, you want to be thinking about

1. Color Schemes
2. Rough composition of UI
3. Animations you might want to useImage
Image
Image
@v0 generates code that is often very verbose. You can and should spend some time refactoring the code that it generates.

One thing I like to get it to do is to iteratively generate a UI I like, take a screenshot of it and then generate it from scratch again.

This anecdotally results in cleaner and simpler code
Read 9 tweets
Sep 14, 2024
1/ If you're building a RAG application, these problems probably sound familiar:

1. Irrelevant search results
2. Insufficient Data to create a database index
3. Multiple data sources that are out of sync
4. Untested LLM agents

How do these problems manifest?
2/ If you're just using embeddings for search, you're going to get items that belong to the wrong period or category without good metadata filters
3/ But if you're not extracting out this metadata information at ingestion time, you're never going to be able to build out these metadata filters either
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(