Andrew Côté Profile picture
Mar 11 24 tweets 9 min read Read on X
AGI must be decentralized and cheap to be accessible for all

Yet scaling laws in data and energy mean it will take trillions of dollars, leading to centralized control

The solution is a total hardware revolution

Here's the Thermodynamic Computing Explainer 🧵 w/@Extropic_AIImage
I've spent the last few months getting to know @BasedBeffJezos, @trevormccrt1 and his team at @Extropic_AI.

What they're building is the Transistor of the AI era - the most natural physical embodiment of probabilistic learning.

To appreciate how, we need to dive deep:Image
The essence of machine learning is to accurately model the statistical distributions governing natural phenomena

You start with a guessed distribution, and slowly shape it to a target distribution - reality - through repeated observations.

Each sample helps better fit reality https://blog.ml.cmu.edu/2021/12/17/learning-observation-models/
The goal is to accurately predict what the underlying phenomena will be, even without having observed that particular case before.

Tuning the model on training and testing data means over-fitting, under-fitting, or achieving usefulness

A good model knows hot dog or no hot dog Image
The different ways of ingesting data, making guesses, rejecting them based on criteria, and updating the guess-making process accounts for the entire panoply of different machine learning models today

It's a complete zoo with a very common flaw - the over-reliance on Gaussians Image
A Gaussian is a particular type of statistical distribution that is like the vanilla ice-cream of probabilities.

It's the default guess for how something behaves, and comes up often due to the Central Limit Theorem.

This classic bell-curve is ubiquitous in nature Image
The issue is that many complex phenomenon are fundamentally not Gaussian-shaped - they might have uneven tails, skew to one side, have more than one 'bump'.

The simplest example is needing two Gaussians to fit the graph below, each with its own mean and variance. Image
Doubling the number of Gaussian to fit the curve also doubles the parameters, but this means the number of possible combinations of parameters is squared

Therefore the size of data needed to learn the underlying distribution grows much faster than the number of parametersImage
Most phenomenon are vastly more complicated than the simple example above, needing larger and larger models with more and more parameters to represent.

Modern LLM's have trillions of parameters and are trained on tens of trillions of tokens.

And then there's the energy cost... Image
The International Energy Agency has released forecasts that because of AI, the global energy demand will double between 2022 and 2026.

@sama has invested $500m into @Helion_Energy while @Microsoft is building its own nuclear energy program

And then theres the chips... Image
We reached the limits of clock frequency in silicon transistors decades ago, and now we're approaching the limits of size as features reach the single-digit nanometer scale.

We've skirted these issues by scaling things massively in parallel, driving the demand for GPUs
Image
Image
Here's where we stand at the precipice of AGI:

- Massive models with even more massive datasets
- Enormous compute facilities reaching the limits of physical hardware
- Requiring the energy and financial budget of nations

Here's how thermodynamic computing changes everything:Image
First, regular transistors aren't the 'Transistors of the AI Era"

Digital logic is ideally suited for deterministic gate operations, but machine learning is inherently probabilistic.

The ideal hardware for machine learning is not deterministic but probabilisticImage
@Extropic_AI use the inherently probabilistic nature of physical systems at the hardware level.

Their systems sits at the meso-scale between classical and quantum computing.

Where entropy is a competitive advantage. Let me explain: Image
The Thermodynamic Advantage comes when the size of chip-elements is comparable to the background thermal fluctuations in energy you get at any finite temperature.

When you need to generate a new sample, you just measure the system.

Your random-number generator are electrons Image
The randomness is truly random, and the shape of the statistical distribution is from the shape of a potential energy well in which electrons sit

You can tune these potentials into complex shapes - non-Gaussians - with just a few parameters.

Escaping the dimensionality curse https://link.springer.com/chapter/10.1007/978-3-030-20726-7_17
In a transistor, the maximum speed of operation is limited by the time it takes enough charge carriers to start moving to reach greater than unity gain.

For a thermo chip, the speed is only limited by the time it takes ambient heat to enter the system and re-randomize its state Image
It's far faster and takes less energy to simply re-randomize a bunch of electrons then induce net current to flow with a voltage.

Therefore thermo chips can use trillions of times less energy and run millions of times faster than junctions.

But it gets even better than this: Image
The process of tuning the energy potential of an electron random-number-generator is inherently an 'Energy Based Method'

Again unlike silicon, on a thermo chip the EBM isn't emulated by massive numbers of digital, deterministic operations

It's baked into the physics itself https://openai.com/research/energy-based-models
Why do EBM's matter?

Recently, the Godfather of Deep Learning @ylecun spoke with @lexfridman about how EBM's will be way forward for LLM's

They provide the shortest path to learning how the world works - again its baked into @Extropic_AI's hardware

The "Brain" @Extropic_AI is developing is one where each thermodynamic neuron learns a complex probability distribution, encoding it in an energy potential

Allowing the fastest possible learning path, using trillions of times less energy and operating millions of times faster Image
How this manifests on physical hardware is in super-specialized ASIC's that perform that sole function integral to any probabilistic learning process:

Tuning and adapting a statistical model by repeated sampling to learn an underlying process in as few observations as possible https://arxiv.org/pdf/1911.01968.pdf
This is the truest definition of "Deep Tech" one can imagine.

An ambitious and demanding engineering problem that if successful, unblocks fundamental progress, relaxes resource constraints, and forever changes the world.

And would mint another multi-trillion dollar company Image
@Extropic_AI is a team forged in the depths of Google's most secretive quantum machine learning skunkworks.

Leveraging the intrinsic properties of physical systems to deliver decentralized, abundant AI for all of humanity.

Developing the Transistor of the AGI eraImage

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrew Côté

Andrew Côté Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Andercot

Feb 29
How to Get to Orbit Cheaper than SpaceX's Starship

Ian Brooke has developed a new kind of jet engine that can act as the first stage of a rocket.

I get brunch with him every Sunday and have grilled him for hours on how it works.

Adaptive Cycle Jet Engines, the primer 🧵
Two facts about rockets:
- They have to carry a lot of fuel
- Structurally they are quite weak

The miracle of the Falcon 9 and Starship is they can be re-used, 10, maybe even 20 times.

But even a Falcon 9 that lasts forever has to consume massive amounts of fuel Image
On the other hand, commercial airplanes are designed for 30,000 cycles. They don't carry as much fuel for an important reason:

Jets use the atmosphere as a ladder

Air is mixed with fuel, combusted, and used as reaction mass. Rockets have to carry their own liquid airImage
Read 15 tweets
Feb 19
A Weekend at the El Segundo Defense Tech Hackathon - The UNIX Timestamp of the Deep Tech Renaissance

This weekend smashed all of my expectations.

Here's my honest impressions and takeaways, and where this fits in to the evolving startup scene.

The Gundo Thread: 🧵Image
Organizers @apollo_defense did a fantastic job bringing together a room full of talented students, defense industry engineers, and investors.

Teams built through the night and even had calls with members of the Ukrainian defense ministry who were keen to see and use the results Image
The winners of the hackathon delivered a functional prototype with thoughtful consideration of real world requirements, to use drones as relays for free space laser comms.

For many teams ML and AI were used but not as the main show - just part of the stack Image
Read 17 tweets
Jan 31
Did you know that all kinds of tubes use physics?

Here's the physics of some of my favorite tubes 🧵

First up: The Vortex Tube, a thermodynamic mystery for over 80 yearsImage
A Vortex tube takes a stream of compressed air and separates it into two streams of high velocity air, one thats cold, one thats hot.

It has no moving parts and uses no electricity, invented in 1933 it wasn't fully explained until 2012.
Image
Image
You might be thinking - molecules of air have a distribution of temperatures, maybe it just separates the hot ones from the cold ones.

But thats thermodynamically impossible without using energy! No, this tube does not employ a Maxwell demon. Image
Read 21 tweets
Jan 31
The miracle of life is to unpack a living organism from one cell, with a single molecule of instructions driving complex protein machinery.

For first time in history, we can produce fully three-dimensional videos of this process.

Let's take a look at Light Sheet Microscopy 🧵https://pubmed.ncbi.nlm.nih.gov/27798562/
The first person to observe single-celled organisms was Antony von Leeuwenhoek, a Dutch gentleman-scientist living in the 1600's.

His simple microscope revealed a hidden world of tiny creatures with a magnification of 300x

He named these new wonders "Animalcules"
Image
Image
The field of optics has come light-years since then, developing progressively better and higher resolution imaging techniques that have further elucidated the true nature of living systems.

Some key metrics are resolution, magnification, contrast, and field-of-view Image
Read 16 tweets
Jan 19
It's a technology that mediates all our access to the digital world, yet hasn't fundamentally evolved in decades.

But screens of all kinds are about to undergo a massive revolution via a technology just barely escaping the lab.

Let's take a look at Light Field Displays, a 🧵https://www.arch.tamu.edu/app/uploads/2021/10/FoVI3D_DeepDrive.pdf
From the Cathode-Ray Tube of yesteryear to the latest flexible OLED display at CES, displays today show a scene the same way:

A 2D grid of diffuse light emitters, where our eyes focus on the plane of the screen to resolve a flat image
Image
Image
The innovation of the earliest "3D Glasses" was to project two different images to each eye simultaneously, creating an illusion of depth by the difference in image between left and right eye - called "parallax"

Modern 3D TV's do the same trick using a Parallax Barrier
Image
Image
Read 18 tweets
Nov 27, 2023
Lowell Wood was an architect of the Strategic Defense Initiative, and worked with Edward Teller, father of the hydrogen bomb, to design a system of orbital lasers that could shoot down ICBMs.

Decades later he built a laser system to shoot down mosquitos - the 'Laser Fence'
Funded by the Bill & Melinda Gates foundation's quest to eliminate Malaria, Lowell worked with a few others and within a year had a working prototype that could shoot down mosquitos. https://opg.optica.org/oe/fulltext.cfm?uri=oe-24-11-11828&id=340880
"We'd be delighted if we destabilize the human-mosquito balance of power," says Jordin Kare, an astrophysicist who once worked at the Lawrence Livermore National Laboratory, the birthplace of some of the deadliest weapons known to man. Image
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(