Matt Shumer Profile picture
May 4 24 tweets 4 min read Twitter logo Read on Twitter
I used GPT-4-32K (+ other models) to analyze hundreds of files and explain how @Twitter's open-source algorithm works.

Now, I'm sharing the code I used, so you can do this on ANY Github repo!

Here's the AI's explanation, my approach, and the code for your own use:
Before I show you the AI's explanation, let me explain how it works:
First, I used this awesome repo (github.com/mpoon/gpt-repo…) to flatten the Twitter algorithm into a single text file.
Then, I uploaded that file to Colab, and split it up into hundreds of chunked strings, small enough for GPT-3.5-Turbo to process.

Why GPT-3.5-Turbo?

Cost.

The Twitter algorithm is over 5 million tokens.

If I ran this step through GPT-4, I'd go broke.
Next, I used GPT-3.5-Turbo to summarize each chunk, keeping only the important details.
I combined those chunks into one long string, but it still was nearly half a million tokens long.

So, I repeated the process above, and broke that string up into chunks.

This time, the costs were feasible, so I made the chunks long enough to take advantage of GPT-4-32K.
Then, I summarized the chunks using GPT-4-32K, and combined them into a string.
Finally, I had a string that could be passed in its entirety to GPT-4-32K.

I did just that, and asked it for an explanation of how Twitter's algorithm works.
Here's what it gave me (note, Twitter is making me break this up into 280-character sections):
The Twitter algorithm connects multiple components and techniques to deliver a personalized user experience. Data processing and feature extraction lay the foundation by organizing various data types and identifying key attributes for content curation.
Machine learning models predict user engagement, while the real-time search engine ensures relevant and timely results. The recommendation system identifies and ranks potential content, and the Home Mixer integrates these components to create personalized home timelines.
Load testing and performance evaluation components ensure the system's efficiency and scalability. Through seamless collaboration, these elements work together to analyze vast amounts of data, curate relevant content, and present a tailored, engaging experience for Twitter users.
Here's a detailed yet concise overview of its workings:
1. **Data Processing**: The algorithm handles data types (tweets, user info, media) and events (client, social graph, user modification, email notification, ads callback engagements) using Kafka.
2. **Feature Extraction**: It extracts features (tweet content, user interests, engagement) from data to evaluate tweet visibility and tailor recommendations.
3. **Machine Learning Models**: Lightweight linear models and TensorFlow predict user engagement and rank tweets based on relevance, using data from external services like User Signal Service.
4. **Real-time Search**: A search engine with Lucene index and faceted search manages, searches, and updates index segments, optimizing search queries and results.
5. **Recommendation System**: A bipartite graph generates candidates, employing algorithms like GatingConfig and EarlybirdSimilarityEngine to filter and rank based on recency, popularity, and similarity.
6. **Home Mixer**: This framework fetches tweets from sources like Conversation Service, UTEG, FRS, processes them through transformers and filters, and ranks them for personalized home timelines.
7. **Load Testing & Performance Evaluation**: Components like EmbeddingSamplingJob, KnnTruthSetGenerator, and AnnLoadTestWorker ensure system performance and scalability, and test the Approximate Nearest Neighbor (ANN) query service.
8. **Configurations**: The algorithm provides settings for different environments, clusters, and Kafka configurations.
In summary, the Twitter algorithm uses diverse techniques like machine learning, real-time search, and recommendation systems to process and analyze data, delivering an engaging and tailored experience for each user.
As you can see, it's not perfect. Not even close.

I wrote this code quickly, with the help of GPT-4.

With a few small improvements, the results could be really, really great.

If someone from Twitter can weigh in on how the AI did, I'd be interested to hear... cc @elonmusk :)
If you want to try this for yourself, you can use this Colab: colab.research.google.com/drive/1HJoAKc5…

Feel free to modify/use it however you want, on any repo.

If you don't have access to GPT-4-32K, you easily can adjust it to use GPT-4 or GPT-3.5-Turbo.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Matt Shumer

Matt Shumer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mattshumer_

May 1
AI videos are getting realistic, fast.

We’re less than three years away from generating entire movies.

Here are some early examples:
Read 5 tweets
Apr 24
GPT-4-32K makes regular GPT-4 look like a toy.

Here are some of the things it can do:
Summarize and answer questions about an *entire* research paper.

I literally just pasted the whole paper in the prompt.

No embeddings required. Image
Take in an entire codebase + supporting documentation, and make changes and improvements.

Long context length opens up fundamentally new opportunities to make ridiculously powerful developer tools. Image
Read 7 tweets
Oct 12, 2022
Stable Diffusion is a powerful tool for creating images.

But generating high-quality results can be extremely difficult.

I created a better way.

Now, anyone can get results on par with the best prompt engineers, without needing to learn how to prompt.

Let's dive in... Image
Stable Diffusion is an AI that creates images from a simple text prompt.

You can say something like "a dog sitting in a field" and it will generate an image like this: Image
The problem is that getting good results with SD can be difficult.

It often requires careful tuning of the prompt, and even then the results can be hit or miss.
Read 9 tweets
Mar 3, 2022
Last week, our friends at @BananaDev_ 🍌 released Carrot 🥕 (GPT-3 for computer vision).

I wanted to push it to the limit to see what it could do.

So, I combined Carrot 🥕 and @OpenAI's GPT-3 to create the most powerful image understanding system I'm aware of.

👇👇👇
The system uses GPT-3 to figure out what questions we need to ask about the image to understand it.

We then use Carrot 🥕 to get those answers, and then ask GPT-3 to interpret them.

Lastly, we 'fact-check' our interpretation of the image, and remove inaccuracies.
The result?

Just give the system an image, and it'll describe it in detail.
Read 6 tweets
Jul 15, 2021
Raising your first funding round is incredibly difficult.

But armed with some simple knowledge, it can be 10x easier.

Here are 9 videos that helped me raise my first round of VC funding.

/thread
Fundraising Overview | @andrewfarah and @Jason

Complete breakdown of the fundraising process, tips to raise a competitive round, create great pitch decks, and manage investors.

Fundamentals | @gralston

Determining how much to raise, calculating and understanding dilution, different structures (SAFEs, priced rounds, crowdfunding, etc.), meeting etiquette.

Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(