Tweet

Delip Rao

Dec 3 • 22 tweets • 6 min read

https://twitter.com/jheitzeb/status/1598154943512596480

Despite the amazing results I’ve experienced with ChatGPT, this is not a correct way to look at LLM vs. Google search. Since several other tweets have made this equivalence and have been eager to spell doom for Google, let’s examine the details:

https://twitter.com/jheitzeb/status/1598154943512596480

1. Google has more LLMs deployed internally than any place I know. If private communication is to be believed that number is in the order of “few dozens”. Not talking of BERT/T5 sized models here.

2. Google also has more compute than anyone. The joke is only NSA probably has an estimate of Google’s compute. So they are not compute-limited to build as much big a model as they want.

3. Google neither has lack of talent to build that. Together with Deepmind, Brain probably corners most of the AI talent today. Yes it’s true that many folks on the attention paper left to build startups, but that’s just opportunistic and not a sign of affairs at Google.

4. It is very very likely the largest/most performant model in industry today is either at Google or Deepmind. (Baidu is Chinese government)

5. So natural question is, why then Google has not replaced the traditional search page with a GPTChat like interface, sticking an ad underneath? For many reasons:

6. Search satisfaction is a reputation business. Once folks are let down by a search result on something critical, they will use less and less of that search interface. For example, Twitter search on twitter.com sucks, so I use other options to lookup my own tweets.

7. LLMs despite their recent impressive performances, still hallucinate a lot, and are temperamental to starting conditions. LLMs are less trust worthy in that sense.

8. Most importantly, LLMs fail in strange unpredictable ways. We are okay with things that fail and learn how to adapt to them as long as they fail predictably.

9. For instance, in early days of Google, they did not answer natural language queries. So humans adapted with query rewriting strategies to satisfy their intent. Google, for their credit, recognized this and doubled down on implementing natural language query answering.

10. Search users have multiple intents. Besides information seeking, browsing intent has not gone anywhere. The internet, specifically Dmoz/Yahoo directory, killed Yellow Pages. Google killed internet directories by generating an intent specific directory, aka search results.

11. We haven’t abandoned browsing behavior, instead we have readjusted search expectations to be somewhere between information seeking and browsing. This is also why Google shows both a “knowledge panel” and a list of search results.

12. Google’s knowledge panel today is powered with a combination of language models, ontologies, and an array of information extraction techniques. But all of them are able to not just answer your question but also list the sources they come from.

13. For many queries, this provenance information is as valuable, if not more, to the user as the exact answer. LLMs cannot, in the traditional formulation of an LM, reliably tell where the answers are coming from.

14. In addition, Google generates related questions to satisfy query refinements rapidly.

15. In fact, when they have the information, google is so eager to satisfy your intent that they don’t even want you to enter their result page with ads. This is because they understand users well. People will abandon if things are not convenient.

@huggingface

16. All this said, LLMs and even small LMs like BERT can dramatically change Google’s search traffic. These open LMs (thanks to @huggingface), make it easy for a lot of site-specific search interfaces to improve. But that’s a small fraction of google searches.

17. Interestingly, these LLM-based answering engines will kill traffic to a lot of small and and not-so-small websites. If this product becomes widely accessible and free, traffic to sites like Wikipedia will drop. So will a lot of how-to websites and other niche websites.

18. By now it should be abundantly clear that Google is not going anywhere. The rumors of its death are clearly exaggerated. It is not for the lack of resources or talent that Google has not turned on a ChatGPT-like interface.

19. Google understands search better than anyone. But funnily they are not a search company. They are a company seeking to understand and organize the world’s knowledge. So far they have plenty of products demonstrating that, with search engine being just one of them.

20. This opinion of ChatGPT “killing” search is not something most experienced NLP and IR researchers share. It’s typically held by a startup folks pushing products or investors pushing their portfolio cos or >>

by developers who think we don’t need other #nlproc methods because we can prompt large language models. While this a very critical or “foundational” technology, a better way to look at these things is they are complementary tools. /fin

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @deliprao

Delip Rao

@deliprao

Sep 6

Language Models have taken #NLProc by storm. Even if you don’t directly work in NLP, you have likely heard and possibly, used language models. But ever wonder who came up with the term “Language Model”? Recently I went on that quest, and I want to take you along with me. 🧶

@ucsc

I am teaching a graduate-level course on language models and transformers at @ucsc this Winter, and out of curiosity, I wanted to find out who coined the term “Language Model”.

First, I was a bit ashamed I did not know this fact after all these years in NLP. Surely, this should be in any NLP textbook, right? Wrong! I checked every NLP textbook I could get my hands on, and all of them define what an LM is, without giving any provenance to the term.

Read 25 tweets

Delip Rao

@deliprao

Aug 16

People are opposed to Flow because they cannot believe Adam Neumann is getting funded again despite all the shenanigans with WeWork. I think that’s wrong. If anything, he has proven to be venture fundable, by yardsticks VCs use. The real reason to worry about Flow is >

Flow will squeeze the already burdened renters to return 10x or higher to their investors. It will use tech and data science to consolidate the non-commercial properties much like how WeWork consolidated big chunks of commercial real estate, making home ownership impossible.

Airbnb created a rental crisis in almost every desirable city, but regulations limited the stay terms and somewhat stunted their ambitions. What Airbnb didn’t accomplish, Flow might, and as renters and homeowners, we should worry about that.

Read 4 tweets

Delip Rao

@deliprao

Jun 19, 2021

The hardest part of being an AI researcher is doing good research requires getting lost in the trees and weeds while also not losing sight of the forest. It's tempting to give up minutiae to see forest-level changes at which point you become more or less a spectator/chronicler.

But those who are lost in the trees are often the ones reshaping the forest as code is where a lot of the discovery happens as opposed to flashes of abstract insight. Staying in the weeds, however, can keep you away from seeing larger patterns and making bold strokes.

If you are a senior researcher with the leverage of working in larger teams where you can delegate some of the minutiae and simultaneously keep track of the forest, this is probably the best situation. Which explains why researchers gravitate to large academic/industrial labs.

Read 4 tweets

Delip Rao

@deliprao

May 16, 2021

https://twitter.com/gabrielpeyre/status/1393793331516350468

We might know about this from recent GNN and geometric learning papers, but it first appeared in ML in the “On Manifold Regularization” paper by Belkin, Niyogi, and Sindhwani. That paper was a milestone in semisupervised learning but now forgotten. newtraell.cs.uchicago.edu/files/tr_authe…

https://twitter.com/gabrielpeyre/status/1393793331516350468

The Laplace-Beltrami operator (LBO) on a Riemannian manifold is approximated by the graph laplacian (L = D - A). The normalized graph laplacian has connections with random walks, diffusion processes (Fokker-plank equations), Brownian motion, and heat equations.

Random walks and spectral graph theory, in general, is treasure trove of fun results. The best resource on that is Fan Chung’s amazing textbook ams.org/books/cbms/092/

Read 6 tweets

Delip Rao

@deliprao

Jun 28, 2020

If you’re looking for something to watch this late Saturday evening, join me in watching this documentary on Claude Shannon.

vimeo.com/315813606
Password: Shannon-ISIT (valid this weekend)

Going to live tweet somethings because of the Shannon fanboy I am 😄

Okay, first impressions: The documentary itself (title: “THE BIT PLAYER”) is super well produced. In genre, it reminds me of the SECRETS OF THE SURFACE, the movie about the amazing Maryam Mirzakhani, which everyone should watch too.
zalafilms.com/secrets/#about

Read 42 tweets

Delip Rao

@deliprao

Mar 5, 2020

Survey of #MachineLearning experimental methods (aka "how do ML folks do their experiments") at #NeurIPS2019 and #ICLR2020, a thread of results:

1. "Did you have any experiments in your paper?"

The future is empirical! If we historically look at NeurIPS papers (not just 2019), the number of theoretical submissions is dwindling and now almost relegated to conferences like UAI, and that's unfortunate.

side note: There was a time when folks used to say, "what experiments? It's a NIPS paper!" (also, I am a dinosaur).

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Delip Rao

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @deliprao

Delip Rao

Delip Rao

Delip Rao

Delip Rao

Delip Rao

Delip Rao

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!