, 16 tweets, 9 min read Read on Twitter
@Reedbeta @kamidphish You can't just skip them! They're architecturally defined instructions. Their mere presence has visible effects.
@Reedbeta @kamidphish Example 1: x86 has single-step mode (set TF in EFLAGS). A NOP is defined as one instruction, thus single-stepping is required to stop at every NOP. You can't just fuse across them without changing visible behavior.
@Reedbeta @kamidphish Example 2: everyone's favorite, self-modifying (or cross-modifying) code. Suppose you have a REALLY long string of NOPs (more than a cache line's worth). Now suppose someone modifies instruction bytes in the middle of that cache line.
@Reedbeta @kamidphish (Yes, x86 lets you do that, and unlike most ISAs, has fairly strong rules on what has to happen in that case.)
For the recovery mechanisms to work, you need to have at least one instruction per $UNIT (uArch dependent). Let's say per 16-byte IFETCH block.
@Reedbeta @kamidphish This one doesn't stop you but it means you can't fuse arbitrarily long runs. In practice you'd probably never fuse more than 15 bytes worth. (Defined maximum x86 instruction length.)
@Reedbeta @kamidphish Example 3: suppose you have a run of NOPs spanning a page boundary, and the first page gets unmapped, then a pipeline flush happens.
@Reedbeta @kamidphish The scenario is
cmp eax, 123
nop ; <-- say this one is byte 0 of the next page
je do_stuff ; <-- gets predicted incorrectly
; <other code here>
@Reedbeta @kamidphish On a flush, the machine state gets rolled back to the last retired instruction. If you eliminate the NOPs, that instruction is the CMP, which is in another page! If the OS was unmapping that page at the time (for whatever reason), the re-fetch of the NOPs accesses memory
@Reedbeta @kamidphish that a non-speculating processor wouldn't have, visibly so. Not allowed. (Like the previous one, this one can be sidestepped by not allowing fusing across certain boundaries; it would typically be I$ lines or an integer fraction of them.)
@Reedbeta @kamidphish Post macro-fusion support, I see no strong reason why you couldn't fuse multiple NOPs in principle, _but_:
1. long NOPs were added to x86 >10 years before macro-fusion
2. macro-fusion only covers pairs and I suspect allowing >2 instructions might cause problems elsewhere.
@Reedbeta @kamidphish specifically it depends on how things like x86's single-step are implemented - this one's a doozy since it might affect the behavior of instructions you aren't actually passing through the x86 decoder because they come from the uop cache.
@Reedbeta @kamidphish If you set the TF bit, does that flush the pipe and force the instructions to not be accepted from the uop cache, and then the decoder isn't allowed to fuse? Or are fused uops in the cache annotated such that they can be split into separate uops for cases like this? (No clue!)
@Reedbeta @kamidphish NB there is not just TF, there are also other debug facilities like the debug registers (used for breakpoints) that result in precise exceptions on fetch of a given instruction byte, so that also makes it harder.
@Reedbeta @kamidphish Generally speaking all of this stuff relies on there being a precise record of every instruction byte that was notionally executed in your reorder buffer. So you can never just drop NOPs. You can in principle collapse runs of them (a la macro-fusion) but you must do so in ways
@Reedbeta @kamidphish that make all the architecturally defined debugging etc. mechanisms behave as if you weren't doing it.
All that makes long NOPs easier to deal with than fusing across runs of single-byte NOPs, since it exclusively relies on mechanisms that are already required for other insns.
@Reedbeta @kamidphish (It's a regularly encoded x86 instr with a "memory operand" that's not actually accessed, like LEA. The x86 prefixes and addressing modes give you the various sizes. The decoder doesn't care. Then you don't execute anything but still retire it.)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Fabian Giesen
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!