James Wang Profile picture
Aug 20, 2019 8 tweets 4 min read Read on X
9/ Neural nets can consume GBs of memory. GPUs only have MBs of on-chip memory. So GPUs store neural nets on external memory soldered next to it on the PCB.

The problem is external memory is 10-100x slower & more power hungry vs. on-chip memory. They are also very expensive.
10/ Large models like Google’s Neural Machine Translation don’t even fit in one GPU’s external memory. Often they have to be split up across dozens of GPUs/servers. This increases latency by another 10-100x.

Ideally the whole model fits on a single chip—that's Cerebras' WSE.
11/ Cerebras’ Wafer Scale Engine (WSE) is *one chip* holding 400,000 cores and 18GB of memory. Neural network training happens on one piece of silicon rather than spread across dozens of boards, servers, interconnects. If it works, one chip can replace a rack of GPU servers.
12/ Healthy skepticism is warranted. The industry has never seen anything like this before. It might not live up to these lofty goals for all kinds of reasons. Cerebras says they have customers in trials now and official benchmarks in November so we’ll see.
13/ Lastly, while it’s easy to get excited about what this will do for existing AI algorithms, whether it's 10x or 100x speedup on GPT-2 or BERT, the real excitement is what it will enable.
14/ As @ylecun has pointed out, hardware informs software. The kind of neural nets we have today is a function of the GPUs we have. If wafer sized chips become the norm, we could invent entire new classes of algorithms. Exciting days ahead! /fin
@ylecun Ps. Credit goes to @anandtech for the slide photos from Hot Chips.
anandtech.com/show/14758/hot…
@ylecun @anandtech Specs comparison: Cerebras Wafer Scale Engine vs. Nvidia Volta GPU vs. Nvidia DGX-1 server

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with James Wang

James Wang Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @draecomino

Aug 26, 2022
Going to attempt this absurdly technical recipe from the Contra cookbook tonight and live tweet the process. Let’s begin!
First, make bay leaf oil. Dried bay leaf is notorious for having little flavor. But fresh bay is very good. The oil takes on a piney/eucalyptus note. Gorgeous color too.
I’m using skate wing to make the sauce. It needs to be rich and milky so I’m adopting a Chinese technique - fry and rapid boil to emulsify the fat. It’s still a bit thin so I might reduce more.
Read 10 tweets
May 15, 2022
This year in the markets - learnings and outlook twitter.com/i/spaces/1PlKQ…
Books recommendations on the talk:

amazon.com/Idea-Factory-G…
Read 4 tweets
Oct 16, 2021
I’m going to have a fun TIME tonight.
love this UI for the avax bridge
good news, DAI is sent from the ethereum side.
bad news, it's nowhere to be found on the avax side.
#futureoffinance
Read 7 tweets
May 31, 2021
ETH, with a little help from @LidoFinance and @CurveFinance, can generate 12% yield. But where does this yield come from?

Let's break down the ponzu recipe. 🧫👇 Image
The Ethereum blockchain gives rewards to computers that validate transactions. If you hold ETH, you can validate transactions. The easiest way to do this is to use a service like Lido. The yield is currently ~6%. lido.fi Image
Normally when you stake your ETH, your ETH is locked up. Lido gives you staked ETH (stETH) tokens in return. This makes you ETH liquid and allows you to do stuff with them.
Read 6 tweets
Aug 25, 2020
The public cloud makes it easy for anyone to start a software company—but at a cost—your margins now belong to AWS. Thread:👇
There are three ways of paying for software infrastructure:
1. have your customer pay for it (cheap)
2. build your own data center (somewhat costly)
3. rent from a public cloud (very costly)
The cheapest infra is no infra. This is the classic enterprise software model: the customer buys your sw to run on their own hw. Selling pure sw yields the highest margins in industry: 90%+.
Read 8 tweets
May 13, 2020
1/ How James Cameron’s Terminator 2 predicted modern AI chips and sparked the debate on AI safety. An appreciation thread.👇
2/ This is the chip that powers the T-800. Based on its appearance and commentary from chief architect Miles Dyson, the movie makes three predictions about future processors: 1) neural net acceleration 2) multi-core design 3) 3D fabrication.

Let’s look at these claims.
3/ Among the many technologies Cameron could have picked for Terminator, neural network processor was spot on. Neural net is the breakout technology of the past decade. As of 2020, there are ~100 companies building neural net processors with annual revenues exceeding $5 billion.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(