LaurieWired Profile picture
Oct 14 4 tweets 2 min read Read on X
GPU computing before CUDA was *weird*.

Memory primitives were graphics shaped, not computer science shaped.

Want to do math on an array? Store it as an RGBA texture.

Fragment Shader for processing. *Paint* the result in a big rectangle. Image
Image
As you hit the more theoretical sides of Computer Science, you start to realize almost *anything* can produce useful compute.

You just have to get creative with how it’s stored.

The math might be stored in a weird box, but the representation is still valid. Image
BrookGPU (Stanford) is widely considered the birth of a pre-CUDA GPGPU framework.

Virtualizing CPU-style primitives; it hid a lot of graphical “weirdness”.

By extending C with stream, kernel, and reduction constructs, GPUs started to act more like a co-processor. Image
It’s a bit sad we don’t do (scientific) compute with textures much anymore.

With Brook, it was super simple to bind the compute stream and render the output.

Theoretically, if you mapped an LLM to Brook, you’d be able to visually see the intermediates outputted as textures.

It’s significantly less cool now.

Check out the original paper here, it’s an interesting glimpse at how scientific GPU compute got some traction:
graphics.stanford.edu/papers/brookgp…Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with LaurieWired

LaurieWired Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lauriewired

Oct 13
Colleges do a terrible job of teaching C++.

It’s not “C with Classes”. Injected into curriculums as a demonstration of early CS concepts, it leaves many with a sour taste.

Students later immediately fall in love with the first language that *doesn’t* feel that way. Image
Admittedly, professors are in a tough spot.

To teach the concept, you fundamentally have to constrain the scope of the language. Many schools choose C++ out of practicality.

Controversially, I think toy languages that *aren't* industry standards are better suited for this. Image
Imagine learning the fundamentals of carpentry, but for teaching reasons, an otherwise reputable brand is artificially constrained to hand tools.

Of course, the moment a student jumps into the real world, and experiences their first power tool, it will blow their mind! Image
Image
Read 5 tweets
Oct 3
DDR5 is unstable garbage.

Max out your memory channels? Flaky.
Temperature a bit too hot? Silent Throttle with no logs.
Too “Dense” of a stick? Good luck training.

Last gen was rock solid by comparison. Here's what happened. Image
Image
More than ever, manufacturers have been pushing memory to the absolute limits.

JEDEC, the standards committee, is pretty conservative.

Yet the moment DDR5 launched, everyone threw JEDEC out the window.

Intel + AMD's memory controllers were *not* ready to handle it. Image
DDR5-4800 was the baseline.

Day one kits were pushing 6000+. Today, even 8000+.

On-die error correction is masking chips that would have been binned as trash in the DDR4 era.

The gap between JEDEC spec and retail has never been wider. Image
Read 4 tweets
Oct 2
Virtual Machines render fonts. It’s kind of insane.

TrueType has its own instruction set, memory stack, and function calls.

You can debug it like assembly. It’s also exploitable: Image
Image
Anytime you can run code (albeit very limited code), someone will take advantage of it.

TrueType (TT) is unfortunately famous for many Windows Kernel zero days.

TT is memory bound, therefore not Turing-complete…but you can still do crazy things with it. Image
Fontemon is a fun one, a pokemon-style game packaged as a TTF.

llama.ttf is even more insane. A 60MB font that runs a 15M parameter llama model to generate stories.

Seemingly normal at first, when you use excessive exclamation points it starts to generate text!
Read 4 tweets
Oct 1
This processor doesn’t (officially) exist.

Pre-production Engineering Samples sometimes make it into the grey market.

Rarer still are Employee Loaner Chips. Ghosts abandoned before ever becoming products: Image
Image
A few days ago, someone found an Intel Pentium Extreme 980.

No laser etched model number; just some scribbled sharpie.

In 2004, Intel (very publicly) canceled the 4Ghz Pentium 4…yet here it is.

It's a hint at some internal politics. Image
The Pentium group was all-in on single core performance.

In the early 2000s, Intel advertised wild charts expecting to hit 10Ghz.

Meanwhile, the Core2Duo team was the backup plan.

An underdog team in Haifa, focused on laptops. Image
Image
Read 4 tweets
Sep 29
A common Programmer brag is being extremely adept at keyboard shortcuts.

Tiling WMs, TUIs, Vim keybindings everywhere, etc...

But is it actually faster?

Apple spent $50 million on R&D in 1989 to prove otherwise: Image
Image
Bruce “Tog”, head of UI testing at Apple, claimed their research showed:

1. Users *felt* keyboard was faster
2. Stopwatch tests proved mouse was faster

Hold on. Apple had a huge conflict of interest; they're trying to sell the public on the idea of the mouse. Image
Modern human factors research add some nuance.

At first, Mouse GUI’s are much faster. ~50% better latency than CLI.

After 200 repetitions of the same task, CLI (keyboard) just barely edges out mouse latency. Image
Image
Read 4 tweets
Sep 26
Modern Radio Communication is crazy good.

On the Apollo moon landings, the spacecraft used a ~20W Downlink.

Today, we can get that down to about 0.001W.

Waveguides, phased arrays, and of course software make the difference: Image
Image
First things first, keep it cold. Crazy cold.

Thermal noise kills SnR. Keep an amplifier at ~10 Kelvin, and we get close to fundamental limits (quantum noise floor)!

For S-Band (common for space), that’s about a 5x power reduction. Image
Image
Time for some software.

If you’re an EE, you might be familiar with Shannon’s limit.

Modern encoders (LDPC) get *really* close to this mathematical bound.

Fun fact, your cellphone (on 5g) uses this encoding. 6x power reduction. Image
Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(