Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Aman Sanger

@amanrsanger

May 17, 2022 • 13 tweets • 3 min read • Read on X

The age of pure software is over.

As predicted, software has eaten the world. But AI will subsume it.

The next wave of generational companies will be pure AI-shops (0/n)

Tech falls into 4 eras: Mainframes, chips, personal computers, software (pre-web, Web1/2, SAAS)

In each era, there were tailwinds for company building. Later, these tailwinds became headwinds - you were competing against well-established incumbents in saturated mkts. (1/n)

The uniqueness of the "software era" is its low costs. Building a chip company requires at least 10s of millions in capital. Building a software app costs a handful of AWS credits.

As a result, the software era has birthed a record number of generational companies (2/n)

But tailwinds have begun to fade. This is largely driven by:

1. Market Saturation for both SAAS & consumer ad-based products
2. Changing macro environment. (high inflation with rising interest rates).

There’s less alpha in starting a traditional software company (3/n)

Meanwhile, new eras of tech are marked by a number of factors.

1. Higher barriers to entry & slowdown of mkt growth in the previous era
2. Research breakthroughs / massive consumer behavior shift
3. Bigger market opportunities than ever before (4/n)

Human Augmentation + Automation w/ AI will be the next era of tech.

1. Existing moats are massive & market slowdown for SAAS & consumer-attention products
2. Breakthroughs in Scaling NN's
3. Replacement of human labor is the largest mkt opportunity in humanity's history (5/n)

@OpenAI

We are already seeing the augmentation/replacement of graphic artists with @OpenAI’s #dalle2 and coders with #copilot. Another great example is @AdeptAILabs, building language-based tools to boost productivity for use of existing software products. (6/n)

Most existing “AI companies” use AI to improve their products. There has yet to be a massive company that is a pure AI play. They mark a departure from modern venture and a return to older times of large capital costs (for compute & data acquisition) (7/n)

In the short term, with high labor costs, these companies will have a massive advantage given their core product is replacing labor hours. For eng/researcher hiring, AI companies gain an edge of “excitement” over traditional software. (8/n)

They require fast growth to acquire data moats, to train even larger models, building an expensive yet powerful flywheel and barriers for new competitors.

Even more enticing is that the market winners will have accumulated enough data & researchers to have a shot at AGI (9/n)

This doesn't mean that the current dominant players (FAAMG+) will become obsolete. Many of them have survived substantial era changes. To remain relevant, they will need to incorporate more and more sophisticated AI (as many of them are doing). (10/n)

And this is not to say that it will be impossible to start traditional software companies. It will just be more difficult than in the previous few decades and less value additive than AI. (11/12)

But my money is on the next generation of AI companies fueled by years of incredible Deep Learning research. (12/12)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @amanrsanger

Aman Sanger

@amanrsanger

Mar 26, 2024

Long context models with massive custom prompts (~2M) may soon replace fine-tuning for new knowledge!

Let’s explore why:
(1/10)

The fine-tuning we care about is learning new, useful information not in pretraining.

For example: a company’s codebase or internal documentation.
(2/10)

We prefer this to RAG because we want sophisticated reasoning over this knowledge rather than single-hop q&a.

Once fine-tuned, these models could be served on custom instances for a full company.

But we can do better!
(3/10)

Read 11 tweets

Aman Sanger

@amanrsanger

Mar 5, 2024

With a 256K token prompt, a 7b model can generate tokens as quickly as codellama-7b with an 8K prompt.

How? The model must use multi-query attention.

Here's why...
(1/10)

For large context windows and large batch inference, the bottleneck for generation speed is bottlenecked by KV cache size.

To illustrate this, let’s look at our 8K context window with multi-head attention at a batch size of 16.
(2/10)

To generate each token, we need to spend 7e9*2*16 = 224 GFLOPs. [1]

We won’t be compute bound as a single A100 has 300TFLOPs, meaning it could sustain >1000 tokens/s without the memory bottleneck.
(3/10)

Read 13 tweets

Aman Sanger

@amanrsanger

Jan 12, 2024

One magical part of Cursor’s internal tech stack is a prompt compilation library called priompt ()

Here's why works so well... (1/12) github.com/anysphere/prio…

Standard prompting libraries use variants of “f-strings” with subbed-in inputs.

For us, a prompt is defined as a function that maps some set of inputs X and a token budget n to some string, s:

p(X, n) = s

We call this operation "rendering"

(2/12)

For example, my inputs, X, could include conversation history, contents of the current file, chunks of documentation, and codebase context we deem relevant.

This sums to 100K tokens. But the budget we are working with may just be 4000 tokens.

(3/12)

Read 13 tweets

Aman Sanger

@amanrsanger

Dec 5, 2023

At Cursor, we've built very high-quality retrieval datasets (for training embeddings/rerankers).

To do this, we use GPT-4 grading and the Trueskill ratings system (a better version of Elo)

Here’s how.. (1/10)

We start with an out-of-the-box dataset of coding queries and get the top 100 ada embedding results in their repositories.

But we need much better ground truth labels than cosine similarity.
(2/10)

Our goal is to determine an approximate ground-truth ordering over the 100 code blocks for each query.

This is where an effective technique from the literature comes in handy: “Listwise Reranking” [1]
(3/10)

Read 11 tweets

Aman Sanger

@amanrsanger

Dec 2, 2023

https://twitter.com/Sirupsen/status/1730351052770152577

After switching our vector db to @turbopuffer, we're saving an order of magnitude in costs and dealing with far less complexity!

Here's why...
(1/10)

https://twitter.com/Sirupsen/status/1730351052770152577

We've seen two key advantages of Turbopuffer with no perf degradation:

1. Normal vector database pricing makes no sense for our workloads (lots of moderate-sized indices).
2. The normal “pods” or cluster-based indices (of Pinecone for example) add unnecessary complexity

(2/10)

Most vector databases store the indices in memory.

For older use-cases, this made sense A given customer will have several large vector indices with consistently high usage on each index.

And the index should be in memory for high-throughput/low-latency querying.

(3/10)

Read 11 tweets

Aman Sanger

@amanrsanger

Nov 28, 2023

People claim LLM knowledge distillation is trivial with logprobs, but that's not quite right...

It's very tricky to distill between different tokenizers. [1]

Internally, we've solved this with a clever algorithm we called tokenization transfer
(1/7)

To start, we needed to build a sophisticated primitive called the "Logmass Trie"

It's an extended Trie where each edge not only contains a character but a weight that represents the "log probability" of that character conditional on the string thus far
(2/7)

This edge weight is just an estimate.

But it must satisfy the constraint that for a contained string X, summing the log probabilities of the edges on the path to X gives the log probability of X
(3/7)

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Aman Sanger

Try unrolling a thread yourself!

More from @amanrsanger

Aman Sanger

Aman Sanger

Aman Sanger

Aman Sanger

Aman Sanger

Aman Sanger

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!