Semiconductor #VentureCapital easily bought the idea of ASICs replacing GPU for AI, based on the argument that GPUs were primarily built for graphics & would not be efficient for AI in the longer run.
Lets bust that myth (1/n)
Hard thing about Hardware is actually Software.
2016 saw a Cambrian explosion of AI chip startups raise their 1st VC rounds. 5 years later, most startups have launched their 1st gen. chip but are still struggling to build a robust SW stack to support diverse AI workloads (2/n)
NVIDIA introduced CUDA in 2006 to leverage GPUs for computation.
Since then applications in astronomy, biology, chemistry, physics, data mining, manufacturing, finance & other computationally intense fields have used CUDA to accelerate computation (3/n)
GPUs have been positioned for compute intensive workloads for a long time, much before the advent of DL (often misspoken as AI) which is just another compute intensive workload
GPU SW stack has kept on improving release-after-release over the last decade & a half (4/n)
Besides a larger than decade long lead on SW, each GPU generation has also pushed the boundary of peak compute capability - a trend which will continue (5/n)
It is premature to say ASICs will improve efficiency of AI compute because no one really knows how future AI workloads will look.
Designing highly specialized ASICs for AI efficiency is equivalent to throwing arrows in the dark right now (6/n)
DL workloads are continuously changing.
On aggressive schedule, it takes ~2 years for a chip to be spec'ed, designed, fabricated & booted on.
DL workloads get outdated every 18 months. (7/n)
In this fast changing landscape, certain startups already find themselves stuck with wrong design choices or with designs optimized for outdated workloads (8/n)
Eg: plethora of startups focused only on accelerating convolutions or matrix multiplications to later find out that DL networks had evolved to also include a fair share of memory-bound operations (9/n)
Eg: plethora of startups ruled out external memory in the name of power efficiency & cost only to later find out that DL networks had grown to the order of trillions of parameters, and hence would no longer fit within their on-chip memories (10/n)
Fast changing AI landscape needs general purpose compute platform to run diverse workloads, at least until the workloads mature.
When they do, ASICs might deliver higher efficiency for highly specialized Edge scenarios. (n/n)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
While acceleration of key workloads is desired, it is general compute horsepower which will provide the needed flexibility to program solutions for next world challenges. (2/n)
Good software abstraction of foundational building blocks allows engineers to iterate faster with different sophisticated algorithms.