Bruno Levy Profile picture
Apr 4, 2023 168 tweets 39 min read Read on X
1/N
Have you every wondered how a microprocessor works ? How many sheets of paper do you need to write the schematics or description of a processor ? How many years do you need to study before being able to design your own processor ? How much money does it cost to try that ? Image
2/N
>Yes;1;0.05;$50
>What's that ?
>Answers to the questions in the previous post
3/N
Want to know more ?
4/N
(I love democracy 😎)
Let's go on.
To create a CPU on your own, you can:
- assemble a lot ICs on breadboards, like Ben Eater, but it's going to cost you a lot of time and a lot of money. It is also super interesting, but not what I'm proposing
- there is also this super nice risc-V computer made of integrated circuits that I wanted to mention. Love the design (both the logic design and aesthetic design):
pineapple-one.github.io
- and also this one, that has a single NOR gate and that does all the computations in a serial manner, one bit at a time:
mynor.org Image
5/N
But that's not what I'm going to talk about. There are 3 different things that make things much easier:

1) FPGAs (Field Programmable Gate Arrays). You may think of them as a big collection of logic gates embedded in an integrated circuit, that you can connect as you want Image
<pause> the rest is coming shortly, stay tuned...
6/N
I like to think about FPGAs as "electronic clay" that you can "morph" into whatever you want,
see this article in @Nature tech review by @j_perkel with quotes from me, from @sylefeb and others: nature.com/articles/d4158… Image
7/N
Okay, so there are two other things that help a lot,
Normally, this FPGAs come with companion software, that are used to transfer your design into connections configured into the chip. These sofwares are good, but big and slow (there are many jokes about Vivado's weight)
8/N
So now the second thing that makes things easier:
@oe1cxw from @YosysHQ created open-source tools (github.com/YosysHQ/yosys and github.com/YosysHQ/nextpnr). These tools are small, lightweight, fast, and with them, configuring a FPGA feels like working with a compiled language.
9/N
And now the third thing. Suppose you want to build a microprocessor. What will be the instruction set ? You may think of picking an existing one, Intel for instance, but there are two problems
- first intel's instruction set is super complicated
- second, it is copyrighted
10/N
So what about using ARM instruction set ?
- OK, it is much simpler, it's a RISC (reduced instruction set)
- but it is copyrighted
11/N
So what about inventing your own instruction set ?
OK, but that will be *a lot* of work, because you will need to create a set of tools to program it (assembler, linker, and maybe even operating system). With your own CPU, you are like Crusoe, you need to create everything.
12/N
But now there is RISC-V, that has several interesting characteristic:
- first, it's free (as in freedom), so anybody can create a RISC-V CPU without risking to be sued or needing to pay royalties to anybody
- second, as a RISC instruction set, it's simple
13/N
- third, it's modular. You can chose to implement the smallest RV32I core (integer instruction set), or a full-featured RV32IMAFZicsr, with a FPU, and the priviledged instruction set / interrupts / exception support to run Linux, or anything in-between ...
14/N
... or even invent your own instructions if need be
- and fourth, last but not least, it is super well designed, with hardware implementation in mind. I was impressed by the elegance/beauty/simplicity of this instruction set.
15/N
So let's pack our luggage for our journey. You will need hardware and software. My favourite FPGA boards are:
- IceStick ($35), minimalistic, just enougth to start
- IceBreaker ($60), more comfortable
- #ULX3S ($100), fantastic ! large FPGA, SDCard, HDMI, SDRAM ImageImageImage
16/N
(but there are many other ones). You can even do that with software only, using simulation, but experience is not as intense as with real hardware.
Second thing you will need is installing some software,
(Yosys, NextPNR), see this page:
github.com/BrunoLevy/lear…
17/N
Ooh, and one important thing: there is a very active and friendly community on FPGAs: @sylefeb @suarezvictor @splinedrive @enjoy_digital @ultraembedded @samsoniuk @BruceHoult @mithro @dolu1990 @jangray (and I'm sure I'm forgetting may others, please forgive me and ring in !)
18/N
when you ask a question, it rarely takes more than 10 minutes before you get the answer !
Okay, I will stop there for today, and tomorrow serious things will start and we will make our first blinky !
😃🙏 ... OMG, there is a crowd here !!!
Will post the next episode in a few hours (tonight when I come back from work).
Note: most of the material posted in this thread is covered in my "learn-fpga" github project
github.com/BrunoLevy/lear…
If you are interested in this stuff, make sure you follow @OlofKindgren, @matthewvenn, @wren6991, @hansfbaier, @gojimmypi
19/N
Okay, so let's continue (fasten your seatbelt)
First thing we'll need to do is installing some software,
so you'll need Yosys and NextPNR, if you are under Linux (which I recommend), instructions are here:
github.com/BrunoLevy/lear… (probably works also on MacOS)
20/N
If you are under Windows, my friend @sylefeb has written some instructions here:
github.com/sylefeb/Silice…
21/N
Now plug your FPGA to a USB port. If it is an IceStick, I strongly recommend to use an extension chord (else there is a high risk that your cat/kid/yourself destroy the USB connector by running into the IceStick) Image
22/N
So now we are going to blink the LEDs of this thing. Maybe they blink already (if you have a brand new IceStick it has a default design preloaded in it), let's see how to do that on our own...
23/N
The first thing we need to do is indicating which pin of the FPGA (the big square chip on the IceStick) is connected to what. We got 6 pins to identify:
- the clock: the IceStick has a 12Mhz oscillator connected to one of the pins
- the five tiny CMS LEDs
24/N
So we create a file "icestick.pcf" with the following content:
set_io CLK 21
set_io LEDS[0] 99
set_io LEDS[1] 98
set_io LEDS[2] 97
set_io LEDS[3] 96
set_io LEDS[4] 95
25/N
How can you figure out ?
In the user manual (here: latticesemi.com/icestick) you can find which pin of the PGA is connected to which LED / 12 MHz oscillator
https://t.co/rKHVHXCPec
26/N
You can chose your own names instead of LED[] and CLK if you wish (just peek smthg easier to remember than the pin numbers !)
27/N
We can write the SOC.v file, that will be transformed into a circuit in the FPGA:
module SOC (
input CLK,
output [4:0] LEDS,
);
reg [4:0] count = 0;
always @(posedge CLK) begin
count <= count + 1;
end
assign LEDS = count;
endmodule
28/N
To send this to the FPGA, there are four commands to type (you can create a script if you wish). First command:
yosys -q -p "synth_ice40 -top SOC -json SOC.json" SOC.v
29/N
Second command:
nextpnr-ice40 --json SOC.json --pcf icestick.pcf --asc SOC.asc --hx1k --package tq144
30/N
Third command:
icepack SOC.asc SOC.bin
31/N
Fourth (and last) command:
iceprog SOC.bin
32/N
- the first command converts your logic into a list of connected gates and flipflops (smthg called a "netlist")
- the second command computes the way the netlist will be organized into the FPGA (smthg called "place and route") ...
33/N
- the third command converts the result of place and route into a binary file (internal representation for the GPGA), smthg called a "bitstream"
- the fourth command sends the bitstream to the FPGA
34/N
> "Nice ! It does something ... but Bruno, there is something wrong, the LEDs are not blinking"
Any idea of what happens ?
A hint: the six billion dollar man may see something...
35/N
Yes they are blinking, but at 12MHz. If we want to see smthg, we need to "slow down" the clock. Any idea about how to do that ?
36/N
Yes, Alice (second row, fourth seat in the audience) got it right, just count on a larger number of bits, and wire the LEDs to the most significant bits.
37/N
So our clock is at 12 MHz, dividing freq by a million will be OK, this will make the fastest LED blink at 12Hz, and the slowest one at 0.75 Hz.
A million is approx 2^20, so we will insert 20 additional "counting bits" in there and this will make the job.
38/N
New version. Note that "assign" just draws some wires from the MSBs to the LEDS.

module SOC (
input CLK,
output [4:0] LEDS,
);
reg [24:0] count = 0;
always @(posedge CLK) begin
count <= count + 1;
end
assign LEDS = count[24:20];
endmodule
39/N
Redo the 4 commands (if you have not created a script yet it is the moment to do so), aaannnnd:
40/N
A question from Isabel (first row, second seat)
> How can I do if I do not have an IceStick but another FPGA board
> I've written scripts and pcf files for polupar boards here: github.com/BrunoLevy/lear…
41/N
> and what if my board is not listed here ?
> in the datasheet of the board, you'll find which pin is plugged to what, or easier, just ask on Twitter (community is friendly and responsive)
42/N
Another question from Jay (second row, first seat):
> Q: Blinking a couple of LEDs is cute, but this seems so far way from a CPU ? Aren't we wasting our time ?
> A: our SOC.v is 10 lines long. For a fully working CPU, it will not be longer than 200 lines !
<pause for a while, stay tuned>
... (simpler) you can also download pre-compiled binaries from here (Linux/Mac/Windows), wonderful (thank you Matthias Koch for the link)
github.com/YosysHQ/oss-ca…
43/N
If you installed Yosys/NextPNR, generated a bitstream, made the little LEDs blink, then you are not that far away from a fully functional microprocessor. Am I joking ? Let me show you: this image is the complete VERILOG for #femtorv (200 lines). Image
44/N What we will see now is how to write these 200 lines. Aaaa, we got a question, yes Anna ?
> so we are going to create a toy CPU, but this will be just a toy right ? It won't be able to run real programs besides educational fibonacci or factorial right ?
> Excellent question
45/N
> May I answer with another question ? is Doom a real program ? 😉
Yes, you'll be able to run Doom on your own CPU !
(well, Doom uses some RAM, so you'll need a SDRAM controller for that, I'm using the excellent one in LiteX by @enjoy_digital)
More information on how to run Doom here: github.com/BrunoLevy/lear…
github.com/BrunoLevy/lear…
I'm using a tweaked version of the excellent MC1-Doom by @m_bitsnbites: github.com/mbitsnbites/mc…
46/N
But let's get back to our blinky, the super fast one, that blinks at 12 MHz. Is there a way of seeing something / debugging / inserting print statements ? Not directly, but one can use a Verilog *simulator*, that will emulate the gates in your design while you can print().
47/N
Create a file bench_SOC.v with: Image
48/N
Next you need to install iverilog/icarus
On linux: apt-get install iverilog
On Windows: there are probably some precompiled packages somewhere (if not included in the packages prepared by YosysHQ)
49/N
Then type the following two commands:
iverilog bench_SOC.v SOC.v
vvp a.out
50/N
Then you will see how it counts on the LEDs.
Press <ctrl><c>
Then type 'finish' (without the quotes) <enter>
51/N
Let us take a closer look at bench_SOC.v, what it does:
- declares a "wrapper module" around or SOC
- wiggles the clock forever (CLK = ~CLK), "~" stands for NOT. The "#1" is a little delay.
- displays the value of the LEDS wires whenever they change (with LEDS != prev_LEDS)
52/N
Note that you can also use $display() in SOC.v for displaying the value of any signal. It is super useful for debugging a design.
When I started logic design, I did not know about that, and debugged everything using the 5 LEDs of the IceStick, don't do that, it's insane !
53/N
OK, so we are just at the begining of our journey,
Consider a CPU, it fetches instructions from memory, then executes each instruction. We will consider something similar, but much simpler: a programmable christmas tinsel, that fetches the pattern to be displayed from a ROM
54/N
Does not fit in a single tweet, so I'll "slice it", first the
interface of the module, just as before:

module SOC (
input CLK
output [4:0] LEDS
);
55/N
Then, make the clock slower: just count on 20 bits, and use the most significant bit of the counter as the new clock:
parameter SLOW = 20;
reg [SLOW:0] gearbox;
always @(posedge CLK) begin
gearbox <= gearbox + 1;
end
wire slow_clk = gearbox[SLOW];
56/N
Then the ROM that contains the patterns to be displayed, and its initialization data:

reg [4:0] MEM [0:20];
initial begin
MEM[0] = 5'b00000;
MEM[1] = 5'b00001;
MEM[2] = 5'b00010;
...
MEM[20] = 5'b11111;
end
57/N
And finally, on each slow_clk, fetch a LED pattern from the ROM, and update the "program counter". Make it wind when it reaches 20
reg [4:0] leds;
always @(posedge slow_clk) begin
leds <= MEM[PC];
PC <= (PC==20) ? 0 : (PC+1);
end
assign LEDS = leds;
Here it what it looks like ! (not very spectacular, but interesting !)
Full source here.
Try this: program your own LED pattern. Image
58/N
OK, so we have understood
- how to declare a memory and initialize it with numbers
- how to create a circuit that fetches a number from this memory at each clock tick
A processor is a bit more complicated, but not that much: it will fetch a series of instructions from memory
59/N
These instructions are encoded as numbers. For the 32-bits version of RISC-V, these are 32-bit numbers (instead of 5 bits in our programmable tinsel). So we need to know which bit means what. For RISC-V, there is a reference document, it's here:
riscv.org/wp-content/upl…
60/N
So we are going to take a close look at this document, but not everything, because remember, RISC-V is modular, so we will - for now - only look at the most basic instruction set, called RV32I (for Integer).
61/N
RV32I is described in Chapter 2.
There we learn that there are 32 32-bit general-purpose registers (and register 0 is special, more on this later).
By general-purpose, they mean that an instruction can read any register and write to any register
62/N
Now let's jump to the table page 104 (in chapter "RV32/64G Set Listings")
The little table on the top of the page indicates that there are six different instruction encodings (that is, ways of indicating which bit means what in the instruction). We'll talk about that later
63/N
Then there is a big table with the (only) 48 instructions. And it is in fact much simpler than that, because there are big categories:
- First two rows are LUI,AUIPC (will talk about them)
- Then JAL and JALR (J as in Jump), they are used by function calls and gotos
64/N
Then the Branch instructions, that compare two registers and jump to an address based on the result of the test. There are 6 of them (==,!=,>,>=,>u,>=u), where 'u' is for unsigned number.
Yes, Anna has a question !
65/N
Anna> so we have >, >=, but there is no <,<= test ... ? Hrrmm, let me think ...
[seems that Anna is going to answer her own question]
Anna> Aaah, yes, of course, I can replace a test "if(r1 < r2)" with "if(r2 > r1)", just swap the two registers to be compared !
66/N
Anna got it right, there is a general "minimalism" principle in Risc-V, if something can be done with an existing instruction, then there will be no additional instruction...
67/N
... but sometimes it can be easier to write BGT r1,r2 rather than BLT r2,r1
It is done by programming tools (the assembler), that does that for you automatically (BGT is called a "pseudo-instruction").
68/N
Then we have five L... instructions (as in LOAD, that is, transfer value from memory to a register) and three S... instructions (STORE, from register to memory). The three variants of STORE are for bytes, 16bits-values (called halfwords) and 32-bits values (called words)
69/N
And the LOAD instructions can do sign expansion or not. What is sign expansion ? It is the way of representing negative numbers in integer arithmetics.
For instance, -1 as a byte is encoded 11111111, and -1 as a halfword is 1111111111111111...
70/N
Hence if you load a smaller number (byte or halfword) in a 32-bits register, if it is a signed number, you need to copy the leading one (it is called sign expansion). And of course, you don't want to do that if it is an unsigned number (versions of L suffixed by U).
71/N
Then there are 9 instructions (ADDI ... SRAI). These instructions take a value from a register, an immediate value (extracted from some bits of the instruction), combines them with an operation (+, or, xor, shift ...) and stores the result in another register
72/N
Then there are 10 instructions (ADD ... AND) that take values from two registers, combine them with an operation, and store the result in another register.
Aaah, somebody has a question, yes Clara ?
73/N
Clara> Why are there 9 instructions that take a register and an immediate values, and 10 instructions that take two registers ?
To answer your question, I'd suggest you try finding the register-immediate instruction that is missing
Clara> Aaahh, let me look at it ... SUBI !
74/N
Clara> Oooh I see, still this minimalism principle, we can replace SUBI r1,imm with ADDI r1,-imm of course !
75/N
Oooh, I almost forgot to tell you about the special register number zero. Its value is always zero, whatever you write to it. At first sight it seems stupid, but it is in fact super smart:
For instance, if you want to copy register rs to register rd (MOV rd,rs), in fact ...
76/N ... there is no MOV instruction, it is a pseudo-instruction translated into ADD rd,rs,r0. Same thing for
BZ "branch if zero" (replaced with BEQ r1,r0) etc...
And writing to r0 is ignored, useful when using JAL and JALR to implement GOTO (more on this later)
77/N
Nearly there, there is FENCE, FENCE.I (used in multicores, we can ignore them for now), ECALL (used for system calls, we can ignore), EBREAK (stops execution) and six CSRxxx instructions (used to read/write special registers, we can ignore for now)
78/N
Let us summarize, in fact we only have 10 instructions to implement ! (I consider than an OR,AND,ADD.... is the same instruction, the operation is just a parameter), so we have:
1) register,register ALU (rd <- rs1 OP rs2)
2) register,immediate ALU (rd <- rs1 OP imm)
79/N
3) branch (if rs1 OP rs2 PC<-PC + imm)
4) load (rd <- mem[rs1+imm])
5) store (mem[rs1+imm] <- rs2)
6) EBREAK (stop execution)
80/N
JAL and JALR, used to implement function calls and GOTO, each of them counts for one instruction
7) JAL (rd<-PC+4; PC<-PC+imm)
8) JALR (rd<-PC+4; PC<-rs1+imm)
They store the address of the next instruction in rd, useful to implement function call (return address)
81/N
Then the last two instructions, LUI and AUIPC, a bit weird, but simple
LUI (Load Upper Immediate): rd <- imm << 12
AUIPC (Add Upper Immediate to PC): rd <- PC + (imm << 12)
They are there because the immediate values for all other instructions are taken from bits in the ...
... instructions, and have a limited range (another alternative would have been to have the immediates as additional words, hence instructions of variable length, as in Intel processors, but it makes everything more complicated). So we needed instructions to modify ...
... the 20 upper bits of a register. For instance, loading an arbitrary 32 bits value in a register can be done with a combination of ADDI and LUI (and there is a pseudo-instruction LI that does that for you).
82/N
So now we know our "homework": create hardware for decoding and executing the 10 possible types of instructions we have. Let us take another look at the big table in the RISC-V manual, page 104. It is easy to see that the 7 least significant bits of the instruction word...
... indicates which one we have in ALUreg, ALUimm, Branch, JALR, JAL, AUIPC, LUI, Load, Store or EBREAK.
Seeing at the table, there is something else that one can observe: each register is encoded in 5 bits, and it is always the same 5 bits of the instruction word that encode...
...the destination register rd and the two source registers rs1 and rs2. Finally, immediate values are encoded by some bits interleaved with the rest, depending on which of rd,rs1,rs2 are used by the instruction. It defines the 6 different instruction encodings in the small table
83/N
and bits 12,13,14 (a field called "funct3") tells you whether the encoded instruction is an OR, and AND, an ADD etc... (or a BEQ, a BNE etc... for branches).
There are also some subtleties that we will keep for later.
OK, so let's implement the 10 instructions, one by one !
... but we'll see that tomorrow ! Let's call it a day 😉
Well, this is a looonnnng thread, and I'm unsure posting everything here is super useful, because everything is in github: github.com/BrunoLevy/lear…
On the other hand, maybe it is refreshing for you to have your daily dose of riscv-on-FPGA ?
So I'm asking you, shall I
84/N
Ok so I'll continue a bit :-)
Instead of LED patterns for our xmas tinsel, we are going to fill our memory with RISC-V instructions, then it will start having a RISC-V flavor.
So we create a memory with enough room for 255 instructions (for now):
reg [31:0] MEM [0:255];
85/N
Now let us see how to create the following RISC-V program:
add x0,x0,x0
add x1,x0,x0
addi x1,x1,1
addi x1,x1,1
addi x1,x1,1
addi x1,x1,1
lw x2,0(x1)
sw x2,0(x1)
ebreak
86/N
Using the table
//add x0,x0,x0
MEM[0] =
// rs2 rs1 add rd ALUREG
32'b0000000_00000_00000_000_00000_0110011;
- 7 lsb's are 0110011 (ALU reg instr)
- rd is 0
- funct3 is ADD (000)
- rs1 and rs2 are 0
- 7 msb's are 0
87/N
Second instruction: add x1,x0,x0
Just the same, but rd is x1 (00001) instead of x0
88/N
Third instruction: addi x1,x1,1
MEM[2] =
// imm rs1 add rd ALUIMM
32'b000000000001_00001_000_00001_0010011;
- 7 lsbs: ALU imm
- rd is 1
- funct3 is add
- rs1
- 12 msbs are imm
89/N
so now you are able to assemble by hand (not very comfortable, we will see a better way later).
Next steps:
- create a "program counter" (PC)
- at each clock tick, the instruction at address PC is copied into an "instr" register, and PC is incremented
- and ...
... a small set of circuitries recognize the instruction in the "instr" register (one of ALUreg, ALUimm, JAL, JALR, AUIPC, LUI, Load, Store, Branch, SYSTEM). For this we will create 10 is_XXX signals.
90/N
so we have something like:

reg [31:0] PC = 0;
reg [31:0] instr = 32'b0000000_00000_00000_000_00000_0110011;

always @(posedge clk) begin
instr <= MEM[PC];
PC <= PC+1;
end
91/N
and now the 10 wires that recognize the 10 instructions: Image
92/N
but our processor is incrementing PC at each clock tick, whatever happens. We could at least stop when EBREAK is encountered:

always @(posedge clk) begin
if(!isSYSTEM) begin
instr <= MEM[PC];
PC <= PC+1;
end
end
93/N
(EBREAK is the only SYSTEM instr that we implement, hence we just need to test isSYSTEM).
Oooh, and I forgot to say it, instr is initialized with an instruction, do you see what it does ?
Yes Anna ?
Anna> the 7 lsbs: ALUreg. rd,rs1,rs2 are all register 0, and funct3 is add
94/N
Anna> I see, add x0,x0,x0, it does nothing
Yes, it is a "nop" (no-operation). It is reasonable to initialize "instr" with it.
So we implemented one instruction ! only 9 to go !
95/N
Wow, Isabel and Anna are already done, that was fast !
So what you can do now is:
- propel this "CPU" with a slow clock, at we did yesterday
- light a different LED for different instructions
- do a testbench with $display() statements that show the current instruction
96/N
For the testbench, you can do something like: Image
97/N
Supposing you also have: Image
More explanations and example files in step 4 of the tutorial, here:
github.com/BrunoLevy/lear…
So we implemented 1 instruction (SYSTEM/EBREAK) out of 10. Tomorrow we will see how to implement ALUreg and ALUimm instructions (19 instructions) !
98/N
Good morning folks !
So now we are able to assemble (by hand) RISC-V instructions in a ROM, load them one by one to a 32-bits register, and recognize them. Let us see now how to execute them, starting from the easiest ones.
99/N
So we got ALUreg instructions, that take the value of two 32-bit registers, combine them with an operation, and write the result to another register, and we got ALUimm instructions, that take the value of a register and an immediate value encoded in the instruction...
100/N
So to execute both type of instructions, we can imagine something that works in different steps
- step 1: load the instruction from memory
- step 2: get the values of the (up to two) source registers rs1 and rs2 and/or the imm
- step 3: compute result and write it to rd
101/N
So we are going to have a state machine:

parameter FETCH_INSTR = 0
parameter FECTH_REG= 1
parameter EXEC = 2
reg state = FETCH_I;
always @(posedge clk) begin
case (state)
FETCH_INSTR: ....
FETCH_REG: ....
EXEC: ...
endcase
end
102/N
Then we do different things in the different states:

For FETCH_INSTR, we do:
instr <= MEM[PC]
103/N

For FETCH_REGS, we get the value from rs1 and rs2:

reg[31:0] rs1;
reg[31:0] rs2;
...
rs1 <= instr[rs1Id];
rs2 <= instr[rs2Id];

wire [4:0] rs1Id = instr[19:15];
wire [4:0] rs2Id = instr[24:20];

(risc-v is cool, always the same bits of instr encode rs1 and rs2)
104/N

For EXECUTE, we will need to work a little bit (we will need an ALU).
Then we need to "write back" the computed value to rd:

wire writeBackEn =....;
wire [31:0] writeBackData = ...;

if(writeBackEn && rdId != 0) begin
registerFile[rdId] <= writeBackData;
end
105/N
writeBackEn will go high when
the value is computed (in EXECUTE state)
the instruction was aluREG or aluIMM
we also need to make sure that rdId is not 0 (remember, writing to register 0 is systematically ignored)
106/N
Now you can either implement it on your own or download step 5 of the course, here:
github.com/BrunoLevy/lear…
Then if you wire the LEDs to the state, you will see your (wanabee) CPU dancing waltz, 1-2-3 1-2-3 1-2-3 (or FETCH_INSTR, FETCH_REGS, EXECUTE, FETCH_INSTR, FETCH_R...
107/N
Ok so let us see how to make our CPU do actual work.
To implement the 10 ALUreg and the 9 ALUimm instructions, we can imagine that there will be:
the two sources of the ALU:
- source 1 is always rs1
- source 2 is rs1 for ALUreg and Iimm for ALUimm

so this writes:
108/N
wire [31:0] ALUin1 = rs1;
wire [31:0] ALUin2 = isALUimm ? Iimm : rs2;
( The "? :" operator works like its C counterpart, and generates a MUX).
Then there will be something that computes ALUin1 OP ALUin2 and a big MUX that will select the result according to OP.
...
109/N
... OP is mostly determined by funct3, but not only, because a 3 bit code makes 8 different operations.
And there are 10 different ALUreg operations and 9 different ALUimm operations, so how does it work ?
Let us take a closer look at the big table in the RISC-V manual...
110/N
Let us first see the 10 ALUreg operations (opcode 7'b0110011). In addition, bit 30 of the instruction word discriminates between ADD/SUB and between SRL/SRA (shift right logical / shift right arithmetic, that is, that copies sign bit) Image
111/N
And for the 9 ALUimm operations, the difference between SRLI and SRAI is encoded by bit 30 of the instruction (that is not used by the immediate value, because a shift amount cannot be larger than 32).
OK... subtle things... Image
112/N
So the operation is encoded by funct3 and by bit 30,
so our ALU we will be some circuitry to compute all the operations (addition, subtraction, and, or, shifts etc...) and a big mux that will select the one to use based on funct3 and bit 30.
Tweet-size ALU !
always @(*) case(f3)
3'b000:o=(f7[5]&i[5])?in1-in2:in1+in2;
3'b001:o=in1<<shamt;
3'b010:o=$signed(in1)<$signed(in2));
3'b011:o=in1<in2;
3'b100:o=in1^in2;
3'b101:o=f7[5]?$signed(in1)>>>shamt:$signed(in1)>>shamt;
3'b110:o=in1|in2;
3'b111:o=in1&in2;
endcase
end
Important remark:
- f7[5] is 1 for SUB and 0 for ADD, but we need also to test instr[5] to make the difference with ADDI that uses instr[5] to store the immediate !! (overlooked that in my early design and it was super hard to find)
- f7[5] is 1 for SRA/SRAI and 0 for SRL/SRLI
Another remark:
This ALU takes a large number of LUTs in the FPGA.
We will see in a later episode entitled "the incredible shrinking core" how to use a much smaller number of LUTs, thanks to different tricks (indicated by Matthias Koch).
113/N
Complete source and instructions are available in the github "learn-fpga" project (this step: step 6)
github.com/BrunoLevy/lear…
114/N
Assembling RISC-V by hand is super painful, am I wrong ?
Of course, one can use gnu asm to generate a binary file that one can load in a design, we will see later how to do that, but for now, it would be cool to embed little asm programs directly in VERILOG, no ?
115/N
So I wrote a (super simple) RISC-V assembler in VERILOG, that you can use as follows, much better than writing all these zeroes and ones by hand no ? Image
116/N
So we reached step 7 of the tutorial:
github.com/BrunoLevy/lear…
Now our core is able to execute a stream of ALU instructions. In simulation, you can display the instruction beeing executed and the content of the register. On device, you can display the result on the LEDs.
117/N
we are not completely done, but we are making progress !
This little table shows the instructions we have already implemented (3 codeops, 20 instructions). We have 6 codops to go (18 instructions). We are somewhere near the half of our journey. Image
<let's stop, for a while, rest a bit, and admire the view>
118/N
Ok, let us see now how to implement jumps
119/N
There are two jump instructions: JAL and JALR
Yes clara ?
clara> why are they called JAL and JALR, and not just JMP or something ?
Excellent question !
JAL stands for Jump And *Link*
and JALR, Jump And Link Register.
Why "and link",
because these instructions do two things:
120/N
- set the program counter to a new address (computed in different ways, we shall see)
- save before the current value of the program counter + 4 (next instruction) to another register
Hence JAL/JALR can be used to implement both GOTO and function calls ...
121/N
... because the saved PC+4 is the return address (this is what they mean by "And Link". And if what you wanted to do was just GOTO, just dump the return address to x0 (ignore it) !
Let us see what JAL and JALR are supposed to do in more details:
122/N
JAL rd,imm: rd <- PC+4; PC <- PC + Jimm
JALR rd,rs1,imm: rd <- PC+4; PC <- rs1 +Iimm
123/N
JAL does a jump relative to PC, and JALR relative to a reg. JAL does not need to say relative to what it jumps (PC), it has more bits to encode the jump, so there is a new immediate format:
wire [31:0] Jimm={{12{instr[31]}}, instr[19:12],instr[20],instr[30:21],1'b0};
124/N
Now we compute the next value of the program counter:
wire [31:0] nextPC = isJAL ? PC+Jimm :
isJALR ? rs1+Iimm :
PC+4;
and then in the EXECUTE state, update PC like that:
if(!isSYSTEM) begin
PC <= nextPC;
end
125/N
Where do we stand ? slowly making progress ! we implemented 5 codops (ALUreg, ALUimm, JAL, JALR, SYSTEM) out of 10 ! Image
126/N
So let us see now how to implement branches Image
127/N
There are 6 branch instructions, what they do:
if(rs1 OP rs2) PC <- PC+Bimm
In other words: do a test on rs1 and rs2, and add an offset to PC if the test is true.
Test can be ==, !=, <, >=, <u, >=u (u for unsigned int comparison). Image
128/N
And of course, one can do '>' by swapping the arguments of '<'.
So we got a new immediate format for branch (exploiting the unused parts of instr)
wire [31:0] Bimm={{20{instr[31]}}, instr[7],instr[30:25],instr[11:8],1'b0};
129/N
The test to be done is encoded in funct3. We declare a signal 'takeBranch' that will be asserted each time the test is satisfied:
reg takeBranch;
always @(*) begin
...
case(funct3)
3'b000: takeBranch=(rs1==rs2);
3'b001: takeBranch=(rs1 != rs2);
3'b100: takeBranch=($signed(rs1) < $signed(rs2));
3'b101: takeBranch=($signed(rs1) >= $signed(rs2));
3'b110: takeBranch=(rs1 < rs2);
3'b111: takeBranch=(rs1 >= rs2);
default: takeBranch = 1'b0;
endcase
130/N
side technical note: (takeBranch is declared as a reg, assigned in a combinational block, we need the 'default:' case, else it will generate a latch that *we absolutely do not want !!!*)
131/N
So now we just need to update nextPC, and that's all !

wire [31:0] nextPC =
(isBranch && takeBranch) ? PC+Bimm :
isJAL ? PC+Jimm :
isJALR ? rs1+Iimm :
PC+4;
132/N
[Ooh, Clara raised her hand two tweets ago, I did not notice]
Yes Clara
Clara> I noticed we compute rs1+Iimm, which we also did when we implemented ALUimm instructions. If I understood well, each time we write '+', it generates an adder, could'nt we reuse the ALU for that ?
133/N
Excellent ! (OMG, they are good this year, and it is always the *girls* !)
Yes, we are going to talk about that in the "incredible shrinking core" episode, on how to make a core that is as small as possible, once we have completed all the instructions.
134/N
We are here !
Wow, our little core is able to execute 28 different RISC-V instructions, and we have 10 instructions to implement: LUI, AUIPC, Load (5 variants) and Store (3 variants).
Before doing that, is there something we can do with these 10 instructions ? Image
ADD(x10,x0,x0);
Label(L0_);
ADDI(x10,x10,1);
JAL(x1,LabelRef(wait_)); // call(wait_)
JAL(zero,LabelRef(L0_)); // jump(L0_)
EBREAK();

Label(wait_);
ADDI(x11,x0,1);
SLLI(x11,x11,slow_bit);
Label(L1_);
ADDI(x11,x11,-1);
BNE(x11,x0,LabelRef(L1_));
JALR(x0,x1,0);
134/N
So what does this tweet-size program do ?
If you wire the LEDs of the IceStick to x10, and if you keep the 12Mhz clock (no gearbox), what we have is a blinker with a software delay loop.
Jay> After 134 tweets, what we have is just another blinky ???
Yes, you are right Jay ! But consider this, this blinky has a strong RISC-V flavor, and this time, it is your own microprocessor executing RISC-V that counts on the 5 LEDs of the IceStick !
Be reassured, the ability to run much more interesting programs is not that far away
Let's:
(1) resume and implement the couple of instruction ?
(2) proceed directly to pipelined processors ?
(3) that's enough and you prefer to read the tutorial here:github.com/BrunoLevy/lear…
Ooops, silly me, unintentionally started a separate thread instead of continuing this one a while ago, so in this "sub-thread" we got jumps and branches.
Thread continued here:

(I have some difficulties with Twitter GUI, and sometimes I create a new thread instead of continuing the current thread)
"Thread forking" wasn't intentional, but it is a way of illustrating how JAL (Jump and Link) can be used to implement subroutines 😀
Since JAL and JALR store PC+4 in a target register, this can be used implement call and ret:
call func -> jal addr, x1
ret -> jalr x0,x1,0

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bruno Levy

Bruno Levy Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BrunoLevy01

Apr 26
Simulation of Monge-Ampère gravity in a 60 Mpc/h cube, 256^3 particles, z=5.
Article:

with Yann Brenier and @MohayaeeR arxiv.org/abs/2404.07697
Monge-Ampère gravity (right half) creates more abundant and more diffuse filaments as compared to 𝚲CDM (left half) where filaments appear to be more fragmented.
Larger simulation, with 100 million particles, in a volume of 300 Mpc/h^3. At z=5 there is not much difference... Image
Read 8 tweets
Jan 9, 2023
#geogram #geometry #programming
Hello darkness my old friend ...
Diving into exact triangle intersection code, that is exactly bugged !
Revisiting the whole approach, hopefully much much simpler in the end ImageImageImageImage
You may think that once you have exact predicates it is piece of cake, but there is a surprisingly long list of particular cases: trlgs can be in the same plane, touch along a vrtx, an edge, isect can be a single point, a segment, a little triangle, it can even be a hexagon !
Seeing some light at the end of the tunnel... Seems that a reasonably short program can handle all the cases (but it does not fit in the margin, euuu I mean in a tweet !)
Read 6 tweets
Jun 4, 2022
#geogram is a programming library of geometric algorithms
github.com/BrunoLevy/geog… Image
2/N
It has fundamental geometric algorithms (Delaunay/Voronoi in 2D and 3D)
github.com/BrunoLevy/geog…
github.com/BrunoLevy/geog… Image
3/N
A mesh data structure, for surfacic and volumetric meshes, with optional user-defined attributes
github.com/BrunoLevy/geog… Image
Read 37 tweets
Mar 12, 2022
#geometry #programming
New #geogram tutorial on surface reconstruction from #pointsets:

github.com/BrunoLevy/geog…
Coming next:
- remeshing
- parameterization and texture mapping
Remeshing tutorial rdy !
github.com/BrunoLevy/geog… Image
Read 8 tweets
Mar 10, 2022
What's that ?
The raw scanner data for armadillo.
Can we reconstruct the armadillo from it ? Image
Step 1: Filter outliers Image
Step 2: smooth a bit Image
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(