Aman Sanger @ NeurIPS 2024 Profile picture
May 17, 2022 13 tweets 3 min read Read on X
The age of pure software is over.

As predicted, software has eaten the world. But AI will subsume it.

The next wave of generational companies will be pure AI-shops (0/n)
Tech falls into 4 eras: Mainframes, chips, personal computers, software (pre-web, Web1/2, SAAS)

In each era, there were tailwinds for company building. Later, these tailwinds became headwinds - you were competing against well-established incumbents in saturated mkts. (1/n)
The uniqueness of the "software era" is its low costs. Building a chip company requires at least 10s of millions in capital. Building a software app costs a handful of AWS credits.

As a result, the software era has birthed a record number of generational companies (2/n)
But tailwinds have begun to fade. This is largely driven by:

1. Market Saturation for both SAAS & consumer ad-based products
2. Changing macro environment. (high inflation with rising interest rates).

There’s less alpha in starting a traditional software company (3/n)
Meanwhile, new eras of tech are marked by a number of factors.

1. Higher barriers to entry & slowdown of mkt growth in the previous era
2. Research breakthroughs / massive consumer behavior shift
3. Bigger market opportunities than ever before (4/n)
Human Augmentation + Automation w/ AI will be the next era of tech.

1. Existing moats are massive & market slowdown for SAAS & consumer-attention products
2. Breakthroughs in Scaling NN's
3. Replacement of human labor is the largest mkt opportunity in humanity's history (5/n)
We are already seeing the augmentation/replacement of graphic artists with @OpenAI’s #dalle2 and coders with #copilot. Another great example is @AdeptAILabs, building language-based tools to boost productivity for use of existing software products. (6/n)
Most existing “AI companies” use AI to improve their products. There has yet to be a massive company that is a pure AI play. They mark a departure from modern venture and a return to older times of large capital costs (for compute & data acquisition) (7/n)
In the short term, with high labor costs, these companies will have a massive advantage given their core product is replacing labor hours. For eng/researcher hiring, AI companies gain an edge of “excitement” over traditional software. (8/n)
They require fast growth to acquire data moats, to train even larger models, building an expensive yet powerful flywheel and barriers for new competitors.

Even more enticing is that the market winners will have accumulated enough data & researchers to have a shot at AGI (9/n)
This doesn't mean that the current dominant players (FAAMG+) will become obsolete. Many of them have survived substantial era changes. To remain relevant, they will need to incorporate more and more sophisticated AI (as many of them are doing). (10/n)
And this is not to say that it will be impossible to start traditional software companies. It will just be more difficult than in the previous few decades and less value additive than AI. (11/12)
But my money is on the next generation of AI companies fueled by years of incredible Deep Learning research. (12/12)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Aman Sanger @ NeurIPS 2024

Aman Sanger @ NeurIPS 2024 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @amanrsanger

Mar 26, 2024
Long context models with massive custom prompts (~2M) may soon replace fine-tuning for new knowledge!

Let’s explore why:
(1/10)
The fine-tuning we care about is learning new, useful information not in pretraining.

For example: a company’s codebase or internal documentation.
(2/10)
We prefer this to RAG because we want sophisticated reasoning over this knowledge rather than single-hop q&a.

Once fine-tuned, these models could be served on custom instances for a full company.

But we can do better!
(3/10)
Read 11 tweets
Mar 5, 2024
With a 256K token prompt, a 7b model can generate tokens as quickly as codellama-7b with an 8K prompt.

How? The model must use multi-query attention.

Here's why...
(1/10)
For large context windows and large batch inference, the bottleneck for generation speed is bottlenecked by KV cache size.

To illustrate this, let’s look at our 8K context window with multi-head attention at a batch size of 16.
(2/10)
To generate each token, we need to spend 7e9*2*16 = 224 GFLOPs. [1]

We won’t be compute bound as a single A100 has 300TFLOPs, meaning it could sustain >1000 tokens/s without the memory bottleneck.
(3/10)
Read 13 tweets
Jan 12, 2024
One magical part of Cursor’s internal tech stack is a prompt compilation library called priompt ()

Here's why works so well... (1/12) github.com/anysphere/prio…
Standard prompting libraries use variants of “f-strings” with subbed-in inputs.

For us, a prompt is defined as a function that maps some set of inputs X and a token budget n to some string, s:

p(X, n) = s

We call this operation "rendering"

(2/12)
For example, my inputs, X, could include conversation history, contents of the current file, chunks of documentation, and codebase context we deem relevant.

This sums to 100K tokens. But the budget we are working with may just be 4000 tokens.

(3/12)
Read 13 tweets
Dec 5, 2023
At Cursor, we've built very high-quality retrieval datasets (for training embeddings/rerankers).

To do this, we use GPT-4 grading and the Trueskill ratings system (a better version of Elo)

Here’s how.. (1/10)
We start with an out-of-the-box dataset of coding queries and get the top 100 ada embedding results in their repositories.

But we need much better ground truth labels than cosine similarity.
(2/10)
Our goal is to determine an approximate ground-truth ordering over the 100 code blocks for each query.

This is where an effective technique from the literature comes in handy: “Listwise Reranking” [1]
(3/10)
Read 11 tweets
Dec 2, 2023
After switching our vector db to @turbopuffer, we're saving an order of magnitude in costs and dealing with far less complexity!

Here's why...
(1/10)
We've seen two key advantages of Turbopuffer with no perf degradation:

1. Normal vector database pricing makes no sense for our workloads (lots of moderate-sized indices).
2. The normal “pods” or cluster-based indices (of Pinecone for example) add unnecessary complexity

(2/10)
Most vector databases store the indices in memory.

For older use-cases, this made sense A given customer will have several large vector indices with consistently high usage on each index.

And the index should be in memory for high-throughput/low-latency querying.

(3/10)
Read 11 tweets
Nov 28, 2023
People claim LLM knowledge distillation is trivial with logprobs, but that's not quite right...

It's very tricky to distill between different tokenizers. [1]

Internally, we've solved this with a clever algorithm we called tokenization transfer
(1/7)
To start, we needed to build a sophisticated primitive called the "Logmass Trie"

It's an extended Trie where each edge not only contains a character but a weight that represents the "log probability" of that character conditional on the string thus far
(2/7)
This edge weight is just an estimate.

But it must satisfy the constraint that for a contained string X, summing the log probabilities of the edges on the path to X gives the log probability of X
(3/7)
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(