Tweet

Neeva

Follow @Neeva

Mar 11 • 17 tweets • 7 min read

@Neeva

Considering using a LLM for production, but it's too slow?

Well fear not! Today you can with Double Pseudo Labeling!

Here's why the @Neeva team did, and we've never looked back!

🧵

@Neeva

Before we get started, let's briefly go over #NeevaAI and what it is...

✅ Our AI generates answers
✅ Our Search makes sure answers are timely & factual

All in all, @Neeva combines search + AI to generate a single, cited for each of your searches.

#NeevaAI works by combining search & phases of generative LLMs to generate AI answers

Phase 1️⃣: Run per-doc summarization and question-answering models on the top results

Phase 2️⃣: Run a cross-document attributed summarizer to synthesize a single answer for the query

Example⬇️

Now let's get into it...

Multi billion parameters LLMs are very effective at making summaries of documents and web pages across domains & languages.

While their quality is high, LLMs have an insatiable appetite for pods of A100s and they can be slow to generate summaries.

Although using foundational APIs can move this burden behind elsewhere, they still can be too slow or expensive for many applications.

Waiting a few seconds to summarize an email may be ok, but users can’t wait minutes to summarize all the relevant documents for a search query.

@Neeva

The bar on latency is much higher.

At @Neeva our models need to run in a few hundred milliseconds but need to behave like the models that take 10s of seconds.

@Neeva

We've already covered how we reduce latency at @Neeva, like in this thread from our co-founder and CEO @RamaswmySridhar 👇

https://twitter.com/RamaswmySridhar/status/1621870491945533440

@Neeva

As well as here, where we shared how @Neeva achieved ~10x reduction in latency of a fine-tuned encoder-decoder model. 👀

https://twitter.com/Neeva/status/1622640441064579076

@Neeva

And also here, where we explain how @Neeva dropped the tail latency of #NeevaAI by over 3x! 😲

https://twitter.com/Neeva/status/1624526258208927746

@Neeva

Most recently, we covered how @Neeva used asymmetric layer pruning of encoder-decoder models to drop the latency of #NeevaAI by another 1.5-2x. 🙌

https://twitter.com/Neeva/status/1633856484609085443

Today, we're covering how pseudo labeling can be leveraged to train and optimize models.

Traditional pseudo labeling works by using a really large language model (that might be unshippable in production – e.g. Bloom) to do data augmentation & generate labels.

The approach here is simple. 🧠

1️⃣ Spin up your local member of the LLM three comma club
2️⃣ Generate labels for a few thousand examples
3️⃣ Use these pseudo labels train a smaller two comma model, that you can serve in production

@Neeva

This works pretty well but here at @Neeva we aren’t easily satisfied. 😏

To push further, we explored the use of multiple stages of pseudo labeling to improve performance.

Instead of shrinking down from a really large LLM directly to a servable LLM, we first shrink down to an intermediate size LLM.

Then, we generate a much larger set of labels (think millions) from the intermediate size LLM and use them to train our final servable model.

Next, we apply our previously discussed asymmetric pruning.

This final compressed model allows us to generate great summaries in under 400 ms! 🎊

Progressively shrinking models while increasing the amount of labeled data in each stage leaves us with...

🥁🥁🥁

A model which behaves like a model with an order of magnitude fewer parameters!

@Neeva

Enjoyed this thread?

⭐ ⭐ ⭐ ⭐ ⭐

👆 Be sure to hit that follow button to keep up-to-date with @Neeva's latest LLM learnings from the wild.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Neeva

Neeva

@Neeva

Mar 15

@Neeva

What if you could control the sources that go into your AI search engine?

At @Neeva, we're making that a reality.

We're excited to share we added #NeevaAI:
✅ Answer support for verified health sites, official programming sites, blogs, etc.
✅ Availability in the News tab

🧵

@Neeva

First, at @Neeva we're passionate about generative search engines combining the best of search & AI.

But it's clear generative AI systems have no notion of sources or authority.

Their content is based on their reading of source material, which is often a copy of the entire Web.

@Google

On the other hand, search engines care about authority very intimately.

#PageRank (the algorithm that got @Google going) was committed to a better authority signal to score pages, based on the citations they got from other high scoring pages.

Read 12 tweets

Neeva

@Neeva

Feb 10

@avinashparchuri

Yesterday, we talked about why AI chatbots generate frankenanswers.

As promised, today we're going over how NeevaAI implements our solution to frankenanswers.

BIG thanks to our talented AI/ML team: @avinashparchuri @rahilbathwal

Keep reading for how we tackle this problem ⤵️

https://twitter.com/Neeva/status/1623778167193206784

And if you didn't read our thread on this yesterday, check out part 1 below.

(we promise it makes more sense if you do! 😜)

https://twitter.com/Neeva/status/1623778167193206784

In order to fix frankenanswers, we're required to take a step back and ask ourselves... what would we like NeevaAI to do in such cases?

Read 15 tweets

Neeva

@Neeva

Feb 9

Have you seen ChatGPT combine info on multiple entities into an answer that’s completely WRONG? 😬

Generative AI and LLM models can mix up names or concepts & confidently regurgitate frankenanswers.

Neeva is solving this problem on our AI-powered search engine.

Here’s how 🧵

@rahilbathwal

FYI This is a two-part thread series.

Today, with the help of @rahilbathwal, we’ll explain why the problems happen technically.

Tomorrow, we’ll talk through how we’re implementing our solution with our AI/ML team.

Make sure you're following... 👀

In frankenanswers, a generative AI model combines information about multiple possible entities into an answer that’s wrong.

Ex) On this query for `imran ahmed’ from our early test builds, you see a mix up of many intents corresponding to different entities with the same name.👇

Read 13 tweets

Neeva

@Neeva

Feb 6

@asimshankar

1/ NeevaAI serves abstractive summaries of web pages that are generated in real-time.

We achieved this by a ~10x reduction in latency of a fine-tuned t5-large encoder-decoder model.

TY @asimshankar, @rajhans_samdani, @AshwinDevaraj3 + @spacemanidol

See our lessons learned.. 🧵

2/ First off, we found that there are far fewer resources available for optimizing encoder-decoder models (when compared to encoder models like BERT and decoder models like GPT).

We hope this thread will fill in the void and serve as a good resource. 📂

3/ We started with a flan-T5-large model and tuned it on our dataset. We picked the large variant because we found it to generate better summaries with fewer hallucinations and fluency issues.

The problem? The latency is too high for a search product.

Read 14 tweets

Neeva

@Neeva

Jan 18

1/ At #Neeva, design is the act of giving form to an idea: we gather data and inspiration, think, make, and iterate through feedback. 💡

Here's how our team, working alongside the ✨Neeva Community✨, shaped our latest news tool, #BiasBuster...

(read on 📖)

2/ To improve the news experience on Neeva, we solicited insights from users of various news outlets.

One early finding 👉 the journey to get daily news typically started from news providers' sites and apps, but NOT from a search engine.

🤔

3/ So we asked ourselves, when does a search engine becomes necessary and helpful in the journey? 💭

Several users shared that they searched for specific events and stories about which they wanted to learn more.

An avid news user put, "Search is for focused topics.".

Read 13 tweets

Neeva

@Neeva

Jan 17

1/ Have you heard? Bias Buster is now available in #Neeva's main search tab!

🔎 Try a search here: neeva.com/search?q=calif…

And if you're wondering how we crawled and evaluated topics to create our 5 point scale slider, stay tuned! 🤓

We dive into it on this thread 🧵…

2/ Our goal? 👉 Show a variety of POVs on particular news topics.

To reach this goal, we categorized results based on 5 buckets to ensure a smooth experience while interacting with the slider. This includes:

🪣 Far Left
🪣 Left Leaning
🪣 Center
🪣 Right Leaning
🪣 Far Right

@AllSidesNow

3/ So, how do we categorize our results to fit these buckets?

By using third party media bias tools, such as @AllSidesNow and @MBFC_News.

Each result is categorized by its respective domain.

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Neeva

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @Neeva

Neeva

Neeva

Neeva

Neeva

Neeva

Neeva

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!