How much DooM can fit in a USB port? Quite a bit it turns out! A minuscule #Fomu#fpga board hosts my hardware/software re-implementation of the DooM render loop in the confines of a USB port (uses ~4200 LUTs and < 128 kB of internal RAM). (1/n)
This is a tiny piece of DooM in a 2.1x2.7 mm #fpga. That is pretty small! (can you see it below on the #Fomu board? you might have to zoom ...).
I created within a #riscv computer with specialized texturing and column drawing hardware. Designed to render DooM 1994 levels! (2/n)
The OLED screen is connected to the #Fomu through jumper wires soldered on the pads (a trick inspired by @brunolevy01 Fomu vga mod). (3/n)
CPU-side, the renderer traverses the BSP tree and determines the order of the sub-sectors (SSECTORS). For each screen column, it then intersects the walls of the sub-sectors (SEGS), front to back. The hardware takes care of the actual column drawing and texturing. (4/n)
SSECTORS? SEGS? Time for a refresher on DooM BSP! Checkout @fabynou's excellent DooM Black Book and @FSouchu post on his amazing PICO-8 port. (5/n)
The design fits nicely within the #ice40 UP5K #fpga. See the DSP usage? 8 / 8 ;-) (most are used for ground/ceiling texturing, aka flats). (6/n)
Command buffers (fifo) are used between the CPU, hardware renderer, and OLED SPI controller. This allows to keep everyone happily busy. (7/n)
Everything is written from scratch in #Silice ; from the #riscv CPU to the texture unit. Original game data is extracted from doom1.wad by the Lua pre-processor of Silice. (8/n)
The #Fomu SPI-flash is used to initialize the fpga (bitstream + SPRAM: code, level data). However, with the Fomu we cannot write at arbitrary locations in SPI-flash from a host computer (using dfu-util), except ...
I found a simple trick to safely store data on the #Fomu SPI-flash: concatenate the data to the bitsream file, and dfu-util is happy to upload everything. The data is then at address 262144 (warmboot slot) + 104106 (bitstream size).
The textures are heavily downsampled, but I have a couple ideas to improve that. Also many details are incorrect, but these are mostly minor (texture scale + alignments, e.g. upper unpegged, lower unpegged, offsets, etc.).
This DooM demo should run on any #ice40 UP5K, and it does run as-is on the mighty IceBreaker board by @1bitsquared which I also used for development.
The port by @tnt is particularly impressive. It is based on a #riscv architecture targeting a #ice40 UP5K on a IceBreaker board. It does however require a mod to add a PSRAM chip to the IceBreaker. Check out the great explanatory video!
(?+1/n)
(edit: that's DooM 1993 ;) )
(edit: @FSouchu POOM is a complete remake, rather than a port, as it fully revisits the game for the PICO-8)
Of course now I am thinking about many optimizations!
- texture compression (have a block based prototype, ala S3TC)
- better handling of flats drawing (ground/ceiling) which use too much hardware to my taste
- improved CPU code (skipping some redundancies)
Stay tuned ;-)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The DooM-chip! It will run E1M1 till the end of times (or till power runs out, whichever comes first).
Algorithm is burned into wires, LUTs and flip-flops on an #FPGA: no CPU, no opcodes, no instruction counter.
Running on Altera CycloneV + SDRAM. (1/n)
Everything is described in a language I am working on: SDRAM controller, divider, BSP traversal, texture unit, etc.
Main renderer (w/o data) is 666 lines of code (!).
A great test case, made quite a few improvements, fixed some issues, learned a lot on CycloneV + Quartus.
(2/n)
Rendering uses the original BSP tree (of course!) but is modified to better fit a hardware implementation ; columns are raycast and drawn immediately front-to-back, stopping as soon as fully filled.
(3/n)
Wolfenstein 3D render loop in pure hardware! No CPU, no instruction pointer, no opcodes, only wires and flip-flops. Here runs on a Mojo V3 board (Xilinx Spartan 6) + SDRAM. Reading @fabynou black books while learning about #FPGA could only lead to this ;-)
(1/n)
Implemented from scratch using my language, from the SDRAM double-framebuffer to the Wolf3D DDA algorithm (and this is the original one; fixed point, DDA loop with only adds and shifts, tangent table!). 320x200, 256 18-bits colors palette and VGA output -- old school!
(2/n)