Ever struggled to understand how users use your product?
I just built an open source implementation of Anthropic's internal clustering algorithm - CLIO.
With Gemini Flash, you can generate human readable labels which are clustered and grouped together to spot usage patterns.
Read more to find out how it works
We first generate summaries that redact PII of user conversations.
These are then embedded and clustered using a K-Means algorithm
We then take a cluster and sample contrastive examples from other clusters in order to generate a descriptive name and description for each individual cluster group
Once that's done, we recursively merge clusters together to form higher level clusters that describe broad usage patterns without leaking user information.
I've written up a blog post walking through the code in greater detail where I talk about
- Things I found interesting in the paper
- Implementation Details and examples
- Limitations of my approach and how you can adapt it
I spent the weekend playing around with @v0 and generated almost 80% of this entire UI just by prompting from scratch
Here's a quick thread of 3 things I took away from how to prompt v0 better
1. Use @v0 to quickly generate and evaluate ideas for your UI.
For instance, when it came to the dashboard I wanted to create, I got it to create the following mock-ups to see what fit the best with what I had in mind.
At this point, you want to be thinking about
1. Color Schemes 2. Rough composition of UI 3. Animations you might want to use
@v0 generates code that is often very verbose. You can and should spend some time refactoring the code that it generates.
One thing I like to get it to do is to iteratively generate a UI I like, take a screenshot of it and then generate it from scratch again.
This anecdotally results in cleaner and simpler code
1/ If you're building a RAG application, these problems probably sound familiar:
1. Irrelevant search results 2. Insufficient Data to create a database index 3. Multiple data sources that are out of sync 4. Untested LLM agents
How do these problems manifest?
2/ If you're just using embeddings for search, you're going to get items that belong to the wrong period or category without good metadata filters
3/ But if you're not extracting out this metadata information at ingestion time, you're never going to be able to build out these metadata filters either