Nathan Labenz Profile picture
Feb 27 58 tweets 16 min read
OpenAI's leaked Foundry pricing says a lot – if you know how to read it – about GPT4, The Great Implementation, a move from Generative to Productive AI, OpenAI's safety & growth strategies, and the future of work.

Another AI-obsessive megathread on what to expect in 2023 🧵 Image
Disclaimer: I'm an OpenAI customer, but this analysis is based purely on public info

I asked our business contact if they could talk about Foundry, and got a polite "no comment"

As this is outside analysis, I'm sure I'll get some details wrong, and trust you'll let me know 🙏
If you prefer to read this on a substack, sign up for mine right here.

In the future I might publish something there and not here, and you wouldn't want to miss that. :)…
So without further ado… what is "Foundry"?

Quoting the pricing sheet, it's the "platform for serving" OpenAI's "latest models", which will "soon" come with "more robust fine-tuning" options.
"Latest models" – plural – says a lot

GPT4 is not a single model, but a class of models, defined by pre-training scale & parameter count, and perhaps some standard RLHF.

MSFT's Prometheus is the first public GPT4-class model, and does some crazy stuff!

Safe to say these latest models have more pre-training than anything else we've seen, and it doesn't sound like they've hit a wall.

Nevertheless, I'll personally bet that the "robust fine-tuning" will drive most of the adoption, value, and transformation in the near term.
Conceptually, an industrial foundry is where businesses make the tools that make their physical products. OpenAI's Foundry will be where businesses build AIs to perform their cognitive tasks – also known as *services*, also known as ~75% of the US economy!
The "Foundry" price range, from ~$250K/year for "3.5-Turbo" (ChatGPT scale) to $1.5M/yr for 32K context-window "DV", suggests that OpenAI can readily demonstrate GPT4's ability to do meaningful work in corporate settings in a way that inspires meaningful financial commitments
This really should not be a surprise, because ChatGPT can pass the Bar Exam, and fine-tuned models such as MedPaLM are starting to approach human professional levels as well.

Most routine cognitive work is not considered as demanding as these tests!

The big problems with LLMs, of course, have been hallucinations, limited context windows, and inability to access up-to-date information or use tools. We've all seen flagrant errors and other strange behaviors from ChatGPT and especially New Bing.
Keep in mind, though, that the worst failures happen in a zero-shot, open domain setting, often under adversarial conditions

People are working increasingly hard to jailbreak and embarrass them. Considering this, it's amazing how well they do perform.

When you know what you want an AI to do and have the opportunity to fine-tune models to do it, it's an entirely different ball game.
For the original "davinci" models (now 3 generations behind if you count Instruct, ChatGPT, and upcoming DV"), OpenAI recommends "Aim for at least ~500 examples" as a starting point for fine-tuning.…
Personally, I've found that as few as 20 examples can work for very simple tasks, which again, isn't surprising given that LLMs are few-shot learners and that there's a conceptual (near-?) equivalence between few-shot and fine-tuning approaches.

In any case, in corporate "big data" terms, whether you need 50 or 500 or 5000 examples, it's all tiny! Imagine what corporations will be able to do with all those call center records they've been keeping … for, as it turns out, AI training purposes.
The new 32000-token context window is also a huge deal. This is enough for 50 pages of text or a 2-hour conversation. For many businesses, that's enough to contain your entire customer profile and history.
Others will need retrieval & "context management" strategies

That's where embeddings, vector databases, and dev frameworks like @LangChainAI and @PromptableAI come in!

Watch out for upcoming @CogRev_Podcast conversation with @trychroma founder @atroyn

Aside: this pricing also suggests the possibility of an undisclosed algorithmic advance

Attention mechanism inference costs rise with the square of the context window. Yet, we see a 4X jump in context – which would suggest a 16X increase in cost – with just a 2X jump in price🤔 Image
In any case, the combination of fine-tuning and context window expansion, especially as supported by the rapidly evolving LLM tools ecosystem, mean customers will be able to achieve human performance & reliability – or better! – on many economically valuable tasks – in 2023!
Meanwhile, image & video generation, speech generation, and speech recognition have all recently hit new highs, and there's more coming from eg @play_ht, heard here as the voice of ChatGPT.

Mahmoud is another upcoming @CogRev_Podcast guest!

So what might corporations train models to do in 2023?

In short, tasks with a documented, standard operating procedure will be transformed first.

Work that requires original thought, sophisticated reasoning, or advanced strategy will be much less affected in the short term.
This is a bit of a reversal from how things are usually understood. LLMs are celebrated for their ability to write creative poems in seconds, but dismissed when it comes to doing anything that matters.

I think that's about to change, and I'm not alone

The AI UX paradigm will shift from one that delivers a response but puts the onus on the user to evaluate and figure out what to do with it, to one where AIs are directly responsible for getting things done, and humans supervise.

I call this The Great Implementation
Specifically, within 2023, I expect custom models will be trained to….
Create, re-purpose, and localize content – full brand standards fit into 32K tokens, with plenty of room to write some tweets.

Amazingly my own company @waymark is mentioned with Patrón, Spectrum, Coke, and OpenAI in this article.…
Handle customer interactions – natural language Q&A, appointment setting, account management, and even tech support, available 24/7, pick up where you left off, text or voice.

Customer service and experience will improve dramatically!…
Streamline hiring – in such a hot market, personalizing outreach, assessing resumes, summarizing & flagging profiles, suggesting interview questions. For companies who have an overabundance of candidates, perhaps even conducting initial interviews?
Coding – with knowledge of private code bases, following your coding standards. Copilot is just the beginning here.

Conduct research using a combination of public search and private retrieval.

See this master class, must-read thread from @jungofthewon about best-in-class @elicitorg – it really does meaningful research for you!

Analyze data, and generate, review, summarize reports – all sorts of projects can now "talk to data" – another of the leaders is @gpt_index

Execute processes by calling a mix of public and private APIs – sending emails, processing transactions, etc, etc, etc. We're starting to see this in the research as well.

How will this happen in practice? And what will the consequences be for work and jobs??

For starters, it's less about AI doing jobs and more about AI doing tasks.
Many have argued that human jobs require more context and physical dexterity than AIs currently have, and thus that AIs will not be able to do most jobs.

This is true, but misses a key point, which is that the way work is organized can and will change to take advantage of AI.
What's actually going to happen is not that humans will be dropped into human roles, but that the tasks which add up to jobs will be pulled apart into discrete bits that AIs can perform.
There is precedent for such a change in the mode of production. As recently as ~100 years ago, physical manufacturing looked a lot more like modern knowledge work. Pieces fit together loosely, and people solved lots of small production problems on the fly with skilled machining
Interchangeable parts & assembly lines changed all that; standardization and tighter tolerances unlocked reliable performance at scale. This transformation is largely complete in manufacturing – people run machines that do almost all of the work with very high precision.
Services lag manufacturing because services are mediated by language, and the art of mapping conversation onto actions is hard to standardize. Businesses try, but people struggle to consistently use best practices. Every CMO complains that people don't respect brand standards.
The Great Implementation will bring about a shift from humans doing the tasks that constitute "Services" to the humans building, running, maintaining, and updating the machines that do the tasks that constitute services.
In many cases, those will be different humans. And this is where OpenAI's "global services alliance" with Bain comes in.…
The core competencies needed to develop and deploy fine-tuned GPT4 models in corporate settings include:
*Problem definition* – what are we trying to accomplish, and how do we structure that as a text prompt & completion? What information do we need to include in the prompt to ensure that the AI has everything it needs to perform the task?
*Training data curation / adaptation / creation* – what constitutes a job well done? do we have records of this? do the records reflect implicit knowledge that the AI will need, or perhaps contain certain information (eg - PII) that should not be trained into a model at all?
*Validation, Error Handling, and Red Teaming* – how does model performance compare to humans? how do we detect failures, and what do we do about them? and how can we be confident that we'll avoid New Bing type behaviors?
There is an art to all of these, but they are not super hard skills to learn. Certainly a typical Bain consultant will be able to get pretty good at most of them. And the same basic approach will work across many environments. Specialization makes sense here.
Additionally, the hardest part about organizational change is often that organizations don't want to change. With that in mind, it's no coincidence that leadership turns to consultants who are known for helping corporations manage re-organizations, change, and yes – layoffs.
Consultants have talked like this forever, but this time it's literally true.

Btw @BainAlerts ... may I suggest Cognitive Revolution instead of "industrial revolution for knowledge work"?

:) Image
So, no, AI won't take jobs, but it will do parts of jobs, and many jobs may cease to exist in their current form. There's precedent for this too, when mechanization came to agriculture

Will the people affected will find other jobs? I don't know

Finally … WHY is OpenAI going this route with pricing? It's a big departure from the previous "API first", usage-based pricing strategy, and the technology would be no less transformative with that model. I see 2 big reasons for this approach: (1) safety/control and (2) $$$
re: Safety – @sama has said OpenAI will deploy GPT4 when they are confident it's safe to do so.

A $250K entry point suggests a "know your customer" approach to safety, likely including vetting customers, reviewing use cases, etc.

They did this for GPT3 and DALLE2 too.
Of course, this doesn't mean things will be entirely predictable or safe.

I doubt that the OpenAI team that shipped ChatGPT would have signed off on "Sydney" – my guess is that MSFT ran their own fine-tuning & QA processes.

More here:
re: $$$ – this is a natural way for OpenAI to use access to their best models to protect their lower-end business from cheaper / open source alternatives, and to some degree discourage / crowd out in-house corporate investments in AI.
I am always going on about threshold effects, and how application developers will generally want to use the smallest/cheapest/fastest models that suffice for their use case.

This is already starting to happen – see @jungofthewon again
With Meta having just (sort-of) released new ~SOTA models and @StabilityAI soon to release even better, the stage is set for a lot more customers to go this way, at least as long as OpenAI has such customer-friendly, usage-based, no commitment pricing

OpenAI can't prevent other projects from hitting key thresholds, but they can change customer calculus. Why chase pennies on base use cases when you've already spent big bucks on dedicated capacity? And is there budget for an ML PhD after we just dropped $1.5M on DV 32K?
In conclusion, economically transformative AI is not only here, but OpenAI is already selling it

We'll feel it once models are fine-tuned and integrated into existing systems

A lot will happen in 2023, but of course The Great Implementation will go on for years

Buckle up!
If you made it this far, I'll of course appreciate your retweets, as well as your critical commentary.

And if you want more AI-obsessed analysis, check out the podcast. As I hope you can tell, I do the work!

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Nathan Labenz

Nathan Labenz Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @labenz

Feb 14
Introducing "text-2-commercial" – the unique text-2-video experience we're building @Waymark

Watch our CEO @aperskystern make an original, creative, compelling marketing video for a small business in <1 minute.

I'll explain how it works in the thread
With Waymark, users only have to do two things to get *watchable* videos:

(1) identify their business by name and (optional) location, and

(2) tell us, in their own words, about the video they want to create – or just let the AI come up with something :)
Waymark searches the web for business content, then uses AI to build a business profile – including "about" content, logo, images, & color palette – all in <30 seconds

This could almost be a product unto itself… 🤔
Read 13 tweets
Jan 7
as an AI obsessive and long-time @ezraklein fan, I was excited to see yesterday's podcast with @GaryMarcus.

Unfortunately, as I listened, my excitement gave way to frustration, and I felt compelled to write my first-ever megathread.

Quotes from:…
No disrespect to Ezra here – he's not an AI expert, and he asked some great questions. And I think Gary deserves credit for flagging a number of important issues & likely problems that will come with wide-scale AI deployment – I agree that society is not ready for what's coming!
Nevertheless, there are so many inaccuracies in this interview that I expect it will confuse the audience more than it informs, so while trying to be earnest and polite, I will endeavor to correct the record.

Read 99 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!