Today, we're covering how pseudo labeling can be leveraged to train and optimize models.
Traditional pseudo labeling works by using a really large language model (that might be unshippable in production – e.g. Bloom) to do data augmentation & generate labels.
The approach here is simple. 🧠
1️⃣ Spin up your local member of the LLM three comma club
2️⃣ Generate labels for a few thousand examples
3️⃣ Use these pseudo labels train a smaller two comma model, that you can serve in production
This works pretty well but here at @Neeva we aren’t easily satisfied. 😏
To push further, we explored the use of multiple stages of pseudo labeling to improve performance.
Instead of shrinking down from a really large LLM directly to a servable LLM, we first shrink down to an intermediate size LLM.
Then, we generate a much larger set of labels (think millions) from the intermediate size LLM and use them to train our final servable model.
Next, we apply our previously discussed asymmetric pruning.
This final compressed model allows us to generate great summaries in under 400 ms! 🎊
Progressively shrinking models while increasing the amount of labeled data in each stage leaves us with...
🥁🥁🥁
A model which behaves like a model with an order of magnitude fewer parameters!
Enjoyed this thread?
⭐ ⭐ ⭐ ⭐ ⭐
👆 Be sure to hit that follow button to keep up-to-date with @Neeva's latest LLM learnings from the wild.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
We're excited to share we added #NeevaAI:
✅ Answer support for verified health sites, official programming sites, blogs, etc.
✅ Availability in the News tab
🧵
First, at @Neeva we're passionate about generative search engines combining the best of search & AI.
But it's clear generative AI systems have no notion of sources or authority.
Their content is based on their reading of source material, which is often a copy of the entire Web.
On the other hand, search engines care about authority very intimately.
#PageRank (the algorithm that got @Google going) was committed to a better authority signal to score pages, based on the citations they got from other high scoring pages.
Have you seen ChatGPT combine info on multiple entities into an answer that’s completely WRONG? 😬
Generative AI and LLM models can mix up names or concepts & confidently regurgitate frankenanswers.
Neeva is solving this problem on our AI-powered search engine.
Here’s how 🧵
FYI This is a two-part thread series.
Today, with the help of @rahilbathwal, we’ll explain why the problems happen technically.
Tomorrow, we’ll talk through how we’re implementing our solution with our AI/ML team.
Make sure you're following... 👀
In frankenanswers, a generative AI model combines information about multiple possible entities into an answer that’s wrong.
Ex) On this query for `imran ahmed’ from our early test builds, you see a mix up of many intents corresponding to different entities with the same name.👇
2/ First off, we found that there are far fewer resources available for optimizing encoder-decoder models (when compared to encoder models like BERT and decoder models like GPT).
We hope this thread will fill in the void and serve as a good resource. 📂
3/ We started with a flan-T5-large model and tuned it on our dataset. We picked the large variant because we found it to generate better summaries with fewer hallucinations and fluency issues.
The problem? The latency is too high for a search product.