Pavan Jayasinha Profile picture
ECE @UWaterloo // prev: gpu perf @Modular, ml @extropic_ai, @UntetherAI, @QuantumIQC, @ZapataComputing
May 4 10 tweets 10 min read
I implemented an LLM end-to-end in hardware, and ran it on an FPGA.

Zero Python. Zero CUDA. Just pure SysVerilog.

All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇 Image Before we dive into the project log, some context:

I made this as part of a lab project for the most cracked course at Waterloo: ECE 327. Nachiket Kapre (the Prof) designed this lab.

Because of it, I went from knowing ZERO Verilog to now squashing delta‑cycle races caused by #0 delays in fork/join_any constructs.

I thought this lab is so goated that I'd make a detailed thread describing how anyone with some RTL skills could build this from absolute scratch (including deciding on the architecture / model support itself).

DISCLAIMER: this thread will be detailed and likely not make sense for those with zero RTL experience.
Sep 5, 2024 6 tweets 2 min read
Got nerd-sniped into robotics research 2 months ago with no prior experience

Ended up reimplementing computer vision on my Roomba with SLAM and now it's mapping the lab like a mini-explorer

Here are my 3 most surprising takeaways: #1 - Many tasks in robotics are better done without ML

Coming from a software/ML background, I asked myself why traditional robotics hasn't been replaced with ML.

I learned that ML often comes at the cost of speed & predictability over many traditional statistical approaches. Image