Latest Twitter Threads by @iamtrask on Thread Reader App

Sep 12 • 5 tweets • 3 min read

This is a bigger deal than you might think it is.

Analog computers were the *original* computers used to train neural networks in the early NN waves.

Analog computers were abandoned because they weren't general purpose enough — despite being WAY faster.

But if you just want to do a single really fast program... (e.g. a Transformer)... they're incredible.

They can be *crazy* fast and *crazy* parallel because they aren't burdened by a clock cycle.

So if you thought GPUs were fast... more 1960s tech is here to blow your mind

Put another way... linear circuts have a clock cycle that's effectively the speed of light.

Want to add the number 3 and 5 together?

Put 3 volts down one wire

Put 5 volts down another

Combine the wires... now you've got 8 volts.

Shrink that down to microchip scale and put a zillion of those wires in parallel until you can compute neural network circuits.

It has the potential to be crazy.... crazy fast.

Like... forward prop in a ~single clock cycle... kind of speed

I don't think this paper is *quite* to that part yet... but it's a big step forward.

And unlike quantum computing... this stuff is like quite practical/pragmatic and based on known electronics theory.

It's mostly an engineering issue at this point. To accent how big a deal this might be... *the* reason deep learning is here is NOT because the algorithms are fundamentally better than before.

It's because we could fit them on GPUs beter...

If analog computing really takes off — there's a fair chance that the whole AI field gets a reset with way faster, way sparser, way more powerful AI models.

And sparsity has the potential to have a really surprising side effect — attribution.

If our neural newtorks become sparser, it increases the chances that we can tell whihc data sources are being used for which predictions

(e.g. like RAG!... but on steroids)

Truly... the AI world is built around NVIDIA, CUDA, and a huge moat of products and services.

If someone comes forward with the right analog computing + training algorithm breakthrough... that's a house of cards that will come crAAAAAASHING down.

Big deal. Watch this space.

Sep 22, 2023 • 29 tweets • 6 min read

This is the 1st rigorous treatment (and 3rd verification) I've seen

IMO - this is great for AI safety!

It means that LLMs are doing *exactly* what they're trained to do — estimate next-word probability based on data.

Missing data?

P(word)==0

So where is the AI logic?

1/🧵

https://twitter.com/OwainEvans_UK/status/1705285631520407821

Current hypothesis: LLMs are a lot like surveys.

When they see a context ("The cat and the") they basically conduct a *survey* over every datapoint in a training dataset.

It's like asking every datapoint "what do YOU think the next word might be"?

And then...

Sep 13, 2022 • 7 tweets • 2 min read

Wow - in 8 tweets I just learned and un-learned more about the mysteries of deep neural networks than I've probably learned or un-learned about them in the last two years.

This is the start of something really really big... also a huge door opened for federated learning.

https://twitter.com/SamuelAinsworth/status/1569719494645526529

This technique really seems to get a foothold on managing the intelligence in an AI model. Imagine training 10,000 small models on 10,000 different topic areas and being able to decide exactly what collections of specialties a model was to have.

Heads up #AISafety community!

Share this page!

Enter URL or ID to Unroll