In order to understand why ChatGPT can't replace Google Search, it's useful to understand the early days of web search and the role that PageRank played. 1/n
Before PageRank, a search would return a slew of websites of mixed utility, quality, and veracity. The results were directly tied to matches between what you queried and the text on the pages. 2/n
A web search query (roughly) meant putting in a sequence of text as a query, and getting back websites with the most likely sequences of text following your query. 3/n
That's similar to where we are with ChatGPT today.
Except, the websites are erased, and instead you get snippets of likely response text extracted from different websites. 4/n
But there was a fundamental breakthrough in Search tech with the implementation of PageRank.
With PageRank, the fact that websites link to one another could be used to identify which websites were *the most* linked to.
The *most linked* sites are the ones people tend to want. 5/n
This breakthrough was built on the *traceability* of information from the web: the linking between sources and their content. 6/n
But with ChatGPT, this traceability is erased.
The connections-between-sites that has been the bedrock of uncovering (somewhat) reliable information on the web is removed. 7/n
Crudely, what this means is that ChatGPT is in a stage that's similar to the early days of web search: Yes it can give a lot of information, no there is not a great match between what you want and useful or reliable results. 8/n
We will likely get to a place where there is a ChatSearch app that provides reasonable information. But it will require a fundamental change in how we train models like ChatGPT: 9/n
It will require training on the network of linked web sources & leveraging how they point to one another.
10/10
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Last week, a major AI milestone was hit: the BLOOM model was released, for everyone (including you!) to examine. What is this, and why is this important? 🧵👇 huggingface.co/bigscience/blo…
Recently, many around the world have been introduced to what a "Large Language Model" (LLM) is because of recent news that some people think AI has become sentient. What "AI" has meant in these discussions is based in large part on LLMs.
You can think of an LLM as a laaarrggge list of probabilities (a "model") of different words appearing together in sentences, paragraphs etc.
The thing is, this "list" can be used to construct sequences of words that are likely: This means generating humanlike language!
Nerd news: I am SO STOKED that *all* @huggingface-contributed models on the Hub (huggingface.co/models) have model cards; popular models & those w >10k downloads have high-quality cards, written & combed through manually. Amazing work from @mkgerchick and @_____ozo__ !
Note that many of these are in PR state still. Ahem ahem ahem.
We @huggingface have also developed a bunch of tools for automating model card creation & helping people from different backgrounds to fill different parts of them out (eg, whether you want to do it via raw .md or via a GUI or a mix). This release and announcement coming soon.
More minor things giving me mixed feelings about the article.
1-Google comms' continued attempt to say they have "ethicists" working deeply on these issues. I agree that Ben Zevenbergen & Johnny Søraker are awesome, but also (apologies) Google's inability to hire women here...
is multi-faceted, and not unrelated to demeaning practices, particularly towards women in that organization. Given how they've poisoned Google as a place for tech ethics + women, I don't think ethics-informed women would agree to join (& I'd encourage them not to; can discuss).
Relatedly, their attempt to give themselves ethics legitimacy is destroyed by their actions related to **this very issue** of LLMs. It would be laughable if the stakes weren't so high.
A lot of mixed feelings about what's being reported in this great article from @nitashatiku; everything from my appreciation of @cajundiscordian, to my anger@ a few ppl in Google leadership, to my position as a relatively advanced researcher in this field. washingtonpost.com/technology/202…
In 2020, before @timnitGebru and I were fired, we saw some things w large language models that deeply concerned us.
For me, 1 concern was connected to my training in psycholinguistics & how we process language.
We wrote a paper trying to explain; Google fired us. Here's the deal:
@timnitGebru You can simply see Sections 6.1 and 6.2 in the parrots paper. dl.acm.org/doi/pdf/10.114…
We had our careers thrashed (at least in part) for writing it, so it's worth reading...
Have had a few conversations recently with Googlers about whether building on the foundations of what I had set up there with ethical/responsible/fair AI is *normalizing* what Google did to me or *carrying the torch* of my vision. 🧵 below for those who this is relevant to.
The issue I feel is that, while it is definitely carrying the torch, it is *also* normalizing. It would be different if Google apologized for what it did, or recognize it has done any wrong. Google did (rightfully) give most of you raises/promotions/more influence/etc.
Which makes it even harder to leave -- the sense that what we were all working on is suddenly "flourishing". But it has come at a cost of violent trauma to arguably one of the founders of what many of you are working on.
On this date 1 year ago, under Google's employment, my life was about to change. Tomorrow would be the day that Google publicly put out a statement about me that many have understood to be "smearing". There's a lot to say...(1/n)
First, if it were possible for me to share nitty gritties of everything that happened w/o G combing through to find ways to sue me (as I assume they've continued to do), I would be sharing how f'ing principled I was in dealing with, and speaking up about, discrimination. (1/n)
How my job was quite literally to steer the path at the intersection of ((legal)) and ((ethical)) in the face of uncertainty...(2/n)