TIL: Why are M1 Macs so fast? πŸ€”

Quick, bite-sized thread that answers the above question. Based on my notes from reading: debugger.medium.com/why-is-apples-…

Thread πŸ§΅πŸ‘‡
as per the benchmarks, M1 is beating almost every available processor in the market.

Stats (from Apple):
- 3.5x faster CPU perf (vs i7 Mac Air)
- 3x CPU perf / watt
- 2x faster GPU
- 15x faster ML (vs i3 Mac)

(will link benchmarks at the end)
I'm assuming familiarity with basic concepts of computer architecture.

Otherwise, you can read about them in my previous thread:
M1 isn't just a processor, it's a System on Chip (SoC),

It includes:
🟫 processor cores
πŸš• I/O
🧠 Neural Engine
πŸ—’οΈ memory

Unlocks simplicity, more efficiency and amazing performance.

Traditional processors have cores. Rest are separate components via the motherboard.
βœ… Reason 1:

Instead of having only general purpose CPU cores, M1 has specialized chips inbuilt.

Basically, for almost every workload – M1 has a guy who does one thing, but does it really well and quick! πŸš„

GPU, Neural Engine, ML accelerators.
βœ… Reason 2:

Unified Memory Arch. (UMA)

- Usually, CPU and GPU have different memories or diff memory segments.
- workload handover means CPU gotta copy it to GPU's space.

- Specially designed - high‑bandwidth, low‑latency memory
- shared memory space, no copying
For those of you wondering why were these separate in the first place:

- different data consumption patterns - CPUs want less data real quick (low latency). GPUs want more data at once (high bandwidth).

- GPUs produce too much heat.
βœ… Reason 3:

Out of Order Execution (OoOE) abilities of ARM CPU

- Each core executes multiple instructions in parallel.
- note that this isn't multithreading

M1 Firestorm cores process twice as many instructions as x86. (assuming same clock freq. and same size instructions)
How OoOE works?
[very high level]

CPU analyzes dependencies between instructions, if output from 1 is not affecting 2, then 2 can be executed ahead of time.🀯

Invisible to the developer / end user.
Why can't Intel and AMD do these?

- x86 arch. does OoOE, however it's inferior because of limit on number of decoders in a single processor.

- this limit is present because of variable size of instructions in CISC (1-15 bytes), therefore, start and end are difficult to identify

- SoCs are not just processors, intel does not make other stuff, companies like HP, Dell do.

- SoC world works by buying Intellectual property (designs) e.g. ARM arch and putting them together.

- Intel is unlikely to lend it's IP to 3P companies.
Apple, controls everything in their ecosystem end to end. Every piece of hardware, software, sometimes including ML libraries which provide abstractions.

Freedom to innovate!

Steve Jobs πŸ™‡β€β™‚οΈ
That's all, thanks for reading 😊

If you found this helpful, please like and retweet for better reach. ❀️

Thank you!

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Yash Verma

Yash Verma Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @vermayash8

1 Apr
Unpopular opinion:

Competitive programming for getting into FAANGs is highly overrated.

It shows good problem solving skills, application of DS/Algos.

However, that's not enough for a good software engineer! πŸ‘©β€πŸ’»

Why? πŸ§΅πŸ‘‡

#programming #Software #SoftwareEngineer #codinglife
It tells nothing about understanding of other CS fundamentals.

A deep, well-versed understanding of these concepts becomes essential when you're building resilient, well architected systems.
Building such complex systems always involves understanding constraints, trade-offs and making design decisions based on that.

These decisions, most of the times, are NOT modeled only around time complexity analysis.
Read 18 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!