Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Matt Shumer

@mattshumer_

May 4 • 24 tweets • 4 min read Twitter logo

Read on Twitter

@Twitter

I used GPT-4-32K (+ other models) to analyze hundreds of files and explain how @Twitter's open-source algorithm works.

Now, I'm sharing the code I used, so you can do this on ANY Github repo!

Here's the AI's explanation, my approach, and the code for your own use:

Before I show you the AI's explanation, let me explain how it works:

First, I used this awesome repo (github.com/mpoon/gpt-repo…) to flatten the Twitter algorithm into a single text file.

Then, I uploaded that file to Colab, and split it up into hundreds of chunked strings, small enough for GPT-3.5-Turbo to process.

Why GPT-3.5-Turbo?

Cost.

The Twitter algorithm is over 5 million tokens.

If I ran this step through GPT-4, I'd go broke.

Next, I used GPT-3.5-Turbo to summarize each chunk, keeping only the important details.

I combined those chunks into one long string, but it still was nearly half a million tokens long.

So, I repeated the process above, and broke that string up into chunks.

This time, the costs were feasible, so I made the chunks long enough to take advantage of GPT-4-32K.

Then, I summarized the chunks using GPT-4-32K, and combined them into a string.

Finally, I had a string that could be passed in its entirety to GPT-4-32K.

I did just that, and asked it for an explanation of how Twitter's algorithm works.

Here's what it gave me (note, Twitter is making me break this up into 280-character sections):

The Twitter algorithm connects multiple components and techniques to deliver a personalized user experience. Data processing and feature extraction lay the foundation by organizing various data types and identifying key attributes for content curation.

Machine learning models predict user engagement, while the real-time search engine ensures relevant and timely results. The recommendation system identifies and ranks potential content, and the Home Mixer integrates these components to create personalized home timelines.

Load testing and performance evaluation components ensure the system's efficiency and scalability. Through seamless collaboration, these elements work together to analyze vast amounts of data, curate relevant content, and present a tailored, engaging experience for Twitter users.

Here's a detailed yet concise overview of its workings:

1. **Data Processing**: The algorithm handles data types (tweets, user info, media) and events (client, social graph, user modification, email notification, ads callback engagements) using Kafka.

2. **Feature Extraction**: It extracts features (tweet content, user interests, engagement) from data to evaluate tweet visibility and tailor recommendations.

3. **Machine Learning Models**: Lightweight linear models and TensorFlow predict user engagement and rank tweets based on relevance, using data from external services like User Signal Service.

4. **Real-time Search**: A search engine with Lucene index and faceted search manages, searches, and updates index segments, optimizing search queries and results.

5. **Recommendation System**: A bipartite graph generates candidates, employing algorithms like GatingConfig and EarlybirdSimilarityEngine to filter and rank based on recency, popularity, and similarity.

6. **Home Mixer**: This framework fetches tweets from sources like Conversation Service, UTEG, FRS, processes them through transformers and filters, and ranks them for personalized home timelines.

7. **Load Testing & Performance Evaluation**: Components like EmbeddingSamplingJob, KnnTruthSetGenerator, and AnnLoadTestWorker ensure system performance and scalability, and test the Approximate Nearest Neighbor (ANN) query service.

8. **Configurations**: The algorithm provides settings for different environments, clusters, and Kafka configurations.

In summary, the Twitter algorithm uses diverse techniques like machine learning, real-time search, and recommendation systems to process and analyze data, delivering an engaging and tailored experience for each user.

@elonmusk

As you can see, it's not perfect. Not even close.

I wrote this code quickly, with the help of GPT-4.

With a few small improvements, the results could be really, really great.

If someone from Twitter can weigh in on how the AI did, I'd be interested to hear... cc @elonmusk :)

If you want to try this for yourself, you can use this Colab: colab.research.google.com/drive/1HJoAKc5…

Feel free to modify/use it however you want, on any repo.

If you don't have access to GPT-4-32K, you easily can adjust it to use GPT-4 or GPT-3.5-Turbo.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @mattshumer_

Matt Shumer

@mattshumer_

May 1

AI videos are getting realistic, fast.

We’re less than three years away from generating entire movies.

Here are some early examples:

https://twitter.com/realchasecurtis/status/1652750297586909184

An early filmmaking example.

https://twitter.com/realchasecurtis/status/1652750297586909184

https://twitter.com/realchasecurtis/status/1653001338475761666

An AI-generated beer commercial.

https://twitter.com/realchasecurtis/status/1653001338475761666

Read 5 tweets

Matt Shumer

@mattshumer_

Apr 24

GPT-4-32K makes regular GPT-4 look like a toy.

Here are some of the things it can do:

Summarize and answer questions about an *entire* research paper.

I literally just pasted the whole paper in the prompt.

No embeddings required.

Take in an entire codebase + supporting documentation, and make changes and improvements.

Long context length opens up fundamentally new opportunities to make ridiculously powerful developer tools.

Read 7 tweets

Matt Shumer

@mattshumer_

Jan 31

The definitive AI market map Twitter thread:

@OpenAI

Models:
@OpenAI
@CohereAI
@AI21Labs
@AnthropicAI
@GoogleAI
@StabilityAI

@BananaDev_

Inference/Training/APIs:
@BananaDev_
@gooseai_NLP
@leap_api
@GetSteamship
@MosaicML
@weights_biases

Read 25 tweets

Matt Shumer

@mattshumer_

Oct 12, 2022

Stable Diffusion is a powerful tool for creating images.

But generating high-quality results can be extremely difficult.

I created a better way.

Now, anyone can get results on par with the best prompt engineers, without needing to learn how to prompt.

Let's dive in...

Stable Diffusion is an AI that creates images from a simple text prompt.

You can say something like "a dog sitting in a field" and it will generate an image like this:

The problem is that getting good results with SD can be difficult.

It often requires careful tuning of the prompt, and even then the results can be hit or miss.

Read 9 tweets

Matt Shumer

@mattshumer_

Mar 3, 2022

@BananaDev_

Last week, our friends at @BananaDev_ 🍌 released Carrot 🥕 (GPT-3 for computer vision).

I wanted to push it to the limit to see what it could do.

So, I combined Carrot 🥕 and @OpenAI's GPT-3 to create the most powerful image understanding system I'm aware of.

👇👇👇

The system uses GPT-3 to figure out what questions we need to ask about the image to understand it.

We then use Carrot 🥕 to get those answers, and then ask GPT-3 to interpret them.

Lastly, we 'fact-check' our interpretation of the image, and remove inaccuracies.

The result?

Just give the system an image, and it'll describe it in detail.

Read 6 tweets

Matt Shumer

@mattshumer_

Jul 15, 2021

Raising your first funding round is incredibly difficult.

But armed with some simple knowledge, it can be 10x easier.

Here are 9 videos that helped me raise my first round of VC funding.

/thread

@andrewfarah

Fundraising Overview | @andrewfarah and @Jason

Complete breakdown of the fundraising process, tips to raise a competitive round, create great pitch decks, and manage investors.

@gralston

Fundamentals | @gralston

Determining how much to raise, calculating and understanding dilution, different structures (SAFEs, priced rounds, crowdfunding, etc.), meeting etiquette.

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Matt Shumer

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @mattshumer_

Matt Shumer

Matt Shumer

Matt Shumer

Matt Shumer

Matt Shumer

Matt Shumer

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!