Profile picture
Fabian Giesen @rygorous
, 33 tweets, 5 min read Read on Twitter
I had not seen it and it's a really bad idea. (And I don't really get what the point is in writing a couple thousand words of speculation about it instead of just asking in the first place.)
Here's just some of the substantial differences between memory and register operands:
1. Memory addresses are virtual. There can be synonyms (multiple virtual addresses referencing the same location). There can be unmapped addresses. Accessing anything through a memory address requires a complicated translation process that has multiple failure modes.
Having multiple memory references in the same instruction requires this to be done multiple times, which is a massive slowdown, source of hardware complexity, and can make it incredibly hard to guarantee forward progress in the first place. (It's already being done at least 2x,
once on the instruction and once on the data side, in most processors.) Every extra reference increases the headache, especially wrt. forward progress.
2. This goes double when you ask where the addresses come from to begin with. Also memory? Now you have double-indirect memory accesses in your pipeline. Another complexity explosion, and one of the things that killed both VAX and the Motorola 68000.
3. What is your memory like? Is it byte-addressed? Then on a 64-bit machine, 7/8ths of your addresses mapped to GPRs are between registers. What do you do with those? Do you allow these unaligned reads? Good luck register-renaming that. You have to, at the very least, check for
it on every reference. Do you disallow it? Well, then do you want unaligned reads to work on regular memory? If so, now you have some memory that is magic unalign-able and some that isn't. Is that a separate failure mode? What do you do about it?
Do you make the memory word-adressable only? Machines used to be like that, and it turns out that manipulating anything small with that is a giant pain in the ass and requires massive amounts of code.
4. The address disambiguation problem already exists on regular memory references. There's a substantial amount of HW dedicated to dealing with the fallout. If you look at uArch details for current machines, you'll find that there is a max number of outstanding
instructions, around 200 these days, a max number of outstanding loads, these days around 80 (for beefy machines), and a max number of outstanding stores, around 50 or so.
Stores are further split in two halves, one half to compute the address and one half to store the data; before you know the address, any load that comes after can't actually tell whether it overlaps with one of the pending stores or not.
Never mind multiprocessor memory model issues, purely maintaining the appearance of instructions executing in program order within a single thread requires substantial amounts of tracking for memory references. Which is another reason to minimize memory refs/instr.
5. Speaking about memory, how do you encode these memory references? Physically, within the instruction? Register numbers are relatively small (usually 3 to 5 bits), so you don't mind having several of them per instruction. Memory addresses are quite a bit bigger than that.
If you allow several per instruction, your instructions become giant. Historically, archs without a regular register file are usually either accumulator or stack machines.
This is, in part, for that reason. An accumulator machine has one register that is implicitly used for most operations. (So it doesn't need to be encoded in the instructions). Stack machines have, well, a stack and instructions don't have explicit operands at all.
The whole 2-operand/3-operand instruction thing is something you can only afford if operands can be specified very compactly.
So why not accumulator or stack machines? There's no _fundamental_ reason you can't go that route, however both of these types of architectures really really want your programs to be expressed in long serial dependency chains, because otherwise you get tons of extra instructions
to move data in/out of the accumulator (or shuffle around the stack). If half your instructions end up moving data around the accumulator/stack, you might as well use a 2+-operand encoding. But if you _don't_ do that and stick with the serial code, you have no opportunities
for instruction-level parallelism, and that's no good either!
6. Another issue with memory: everything's in it. So your "registers" are memory. What happens when you jump into wherever memory your "registers" live in? Is your program counter memory mapped? What happens if you jump into your program counter?
You need to define what even happens in cases like that. What order do the logical memory accesses for reading PC, instruction fetch, reading operands, writing results happen in? What state is the machine in if an exception occurs in the middle? (Quite likely.)
You're likely going to forbid the particular example of running your registers as code, say by flagging the memory area as non-executable. (There's another fun issue here with "oh, so now you have privilege checks on individual operands", but let's not even go there).
But still, is the magic memory area with your working data that you want to be able to rename etc. part of the regular memory view? Is it coherent? Can other processors look at it? Can your NIC DMA into or out of it? What happens when they do?
Do you want coherency traffic for that? You don't have anywhere near that kind of bandwidth. When, logically, do such events happen? You need to at least define a set of ordering rules. Or do you just bail on the entire thing and force it private?
7. Fun renaming issue: the whole point with register renaming is that you can do it as soon as you see the instruction. You can't rename memory references before you actually know what the address is (see notes on disambiguation above). So what do you do when someone indexes
the region where your working data lives with a dynamic index? What value do you rename that to? You can't do that renaming until you know the value of the dynamic index. (Really this is just re-stating that memory disambiguation is hard, but worth pointing out separately.)
You can make all these and many other issues go away if you put your working set in a special memory area that is:
a) Not visible or accessible from outside (so the coherency/snooping/ordering issues go away)
b) Not subject to address translation (so no synonyms, no waiting for
translation to finish, easier disambiguation)
c) Word-addressed (preferably), or at least with very clear rules of who can overlap what, and when, that can be decided purely from the addresses,
d) With no support for dynamic indexing (so you can rename/disambiguate easily),
e) with compact addresses (so instruction size is not an issue),
f) with no privilege architecture for the high-traffic accesses (to get that headache away),
g) with no possibility of any missing pages or other faults occuring from accessing said memory,
h) with a forward progress guarantee (in particular, no concern that TLB/store buffer/load buffer associativity limits can prevent a particular instruction from completing sometimes)
...but every single one of these gets you away from "just like regular memory".
what it boils down to is that "regular memory" is in fact anything but, it's a fantastically complicated fiction that a rube goldberg device comprising >70% of your average CPU's area and budget is busy maintaining,
and the last thing you want to do is to lean on it even more.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Fabian Giesen
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!