I implemented an LLM end-to-end in hardware, and ran it on an FPGA.
Zero Python. Zero CUDA. Just pure SysVerilog.
All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇
Before we dive into the project log, some context:
I made this as part of a lab project for the most cracked course at Waterloo: ECE 327. Nachiket Kapre (the Prof) designed this lab.
Because of it, I went from knowing ZERO Verilog to now squashing delta‑cycle races caused by #0 delays in fork/join_any constructs.
I thought this lab is so goated that I'd make a detailed thread describing how anyone with some RTL skills could build this from absolute scratch (including deciding on the architecture / model support itself).
DISCLAIMER: this thread will be detailed and likely not make sense for those with zero RTL experience.
Sep 5, 2024 • 6 tweets • 2 min read
Got nerd-sniped into robotics research 2 months ago with no prior experience
Ended up reimplementing computer vision on my Roomba with SLAM and now it's mapping the lab like a mini-explorer
Here are my 3 most surprising takeaways:
#1 - Many tasks in robotics are better done without ML
Coming from a software/ML background, I asked myself why traditional robotics hasn't been replaced with ML.
I learned that ML often comes at the cost of speed & predictability over many traditional statistical approaches.