Follow @stshank

12,399 views

Stephen Shankland

Follow @stshank

, 49 tweets, 18 min read

My Authors

I'm tuning in to #HotChips2020 today, the 32nd year of the IEEE conference. My favorite graphic so far is AMD's 8-core Ryzen 4000 family of chips, the subject of a later presentation today. Thread/

https://twitter.com/stshank/status/1295117839108431872

https://twitter.com/stshank/status/1295117839108431872

I watched the quantum computing presentations yesterday from IBM, Intel, Microsoft, and Google. Pretty interesting if you can handle the high gorpiness factor. The technology is very green, but is maturing. Here's that thread:

https://twitter.com/stshank/status/1295117839108431872

Right now we're starting with server chip news from Intel and IBM. Intel Ice Lake Xeon processor for servers using 2 chip sockets (Ice Lake-SP). Vs last-gen Cascade Lake, it's got 18% more instructions per clock tick performance boost.

IBM's Power10 server chip will arrive in servers in a little more that a year. It's enormous, with 18 billion transistors and 602mm2 area. Each core has 8 threads, and the high-end model will have 15 cores. (16 with one spare!) Built with Samsung 7nm process.

Power10-based servers can handle up to 2 petabytes of memory. Holy mackerel. (DDR4 to start but can be upgraded to DDR5 later.) You can up up to 16 processors into a single system. Big Iron!

Big Blue has Big Ideas about sharing enormous piles of memory within a pod of servers. "Power 10 open the door to memory disaggregation," says Bill Starke.

Now IBM's Brian Thompto is talking performance gains vs today's Power9 server chip. Estimates based on pre-silicon analysis predict some big boosts. Also notable 2.6x increase in performance per watt for better energy efficiency.

Expect Power10 to deliver 10X supercomputing speed (Top500 supercomputer ranking uses Linpack as benchmark) and better AI performance too (Resnet is AI image recognition software).

Now we're on to Marvell's upcoming Thunder X3 server chip, one of the upstart contenders vs Intel and AMD scale-out servers (the kind that are stacked up by the thousands in data centers). #HotChips2020

Marvell's Rabin Sugumar promises 30% performance increase in single-thread performance for ThunderX3 vs ThunderX2 at a given clock speed.

Overall, though, expect a sizable performance increase. Frequency increase helps.

Now Anthony Saporito at IBM talking about IBM's chip for the latest z15 mainframe. Like all things mainframe it starts with a mainframe no-really-it's-still relevant pitch. "Programs writing for the first mainframes written back in 1964 still run on today's models."

Mainframes are good for computing loads with very high transaction volume, like credit card purchases or hotel reservations. IBM mainframes actually use 2 chips, the CP for computing and SP for storage. They're both honking big processors. #HotChips2020

IBM mainframes have lots of error detection & correction features. The system can even roll back a chip to previous known-good state or automatically migrate one core's entire state to a backup core. IBM blasts its machines with proton beams to make sure they work. Neat!

Back when I wrote about servers a lot, it was a big deal when shiny new Linux came to hoary old IBM mainframes. Now it's pretty ordinary. Linux runs in virtual machine compartments organized by the hypervisor, but now there's a new ultravisor beneath for better security controls.

Now onto AMD Ryzen 4000 series chips, aka "Renoir," presented by is architect, Sonu Arora. 8 cores, more I/O, 2X performance per watt is a pretty big improvement. 156mm2 which is 1/4 the die area of those big IBM server chips.

Performance increased a bunch vs earlier "Picasso" chip design. Instructions per clock tick up 15%. Single-thread performance up 25% at the same 15W power limit, and multithread speed up 200%. #HotChips2020

Ryzen's Vega graphics got a big boost, too, which will be important with Intel's Tiger Lake competitor getting Xe graphics. Not as big a boost vs the Xe jump, but a major decrease in chip surface area required.

Renoir consumes 59% less power during app execution than AMD's earlier Picasso, in part because it spends more time in low-power states. That means better battery life — or better options for boosting CPU/GPU performance when needed.

Ryzen 4000 has built-in support for USB 3.2 (most often delivered via USB-C ports these days). (Vs USB 4 and Thunderbolt 4 in Intel Tiger Lake). USB

"I'm so proud of what we have achieved with Renoir." –architect Sonu Arora.

Now onto Intel Tiger Lake, the new mobile chip coming this fall. Xavier Vera: "The top goal was higher performance for the same power budget," for power levels of 9W to 65W. A challenge to balance power-constrained multithread with power-unconstrained single-thread workloads.

Intel last week detailed lots of Tiger Lake improvements based on new SuperFin manufacturing technology: cnet.com/news/intel-tig…

But here are the core improvements to transistors & metal layers above in today's slides. Expect a significant perf boost vs current Ice Lake chips.

You want block diagrams? We got block diagrams! This is an Intel Tiger Lake example with four Willow Cove cores, but Intel can employ different core counts. It integrates Thunderbolt 4, USB 4, DisplayPort 1.4, 6 cameras, 8K video, and PCIe Gen4. #HotChips2020

You probably can't understate the importance of Xe graphics to Intel Tiger Lake. Integrated graphics have been meh but Xe should mean less need for discrete AMD or Nvidia GPUs. (Intel will offer discrete Xe GPUs, too — a first option this year and a gamer option in 2020.)

With PCIe Gen4 built straight into the Tiger Lake CPU, you'll be able to directly attach an SSD to the CPU.

Intel likes the idea of its "non-coherent fabric," but coincidentally those are the exact words my editor used to criticize my writing style.

(OK not really.)

DVFS power management means Intel can slosh around computing priorities to get the most work done for a given level of power consumption. DVFS states change frequently but Intel wouldn't say how frequently.

This is peeking ahead to this afternoon's talk on Intel's Xe graphics arriving with Tiger Lake, but here's a diagram showing how much chip real estate the GPU takes on the SOC. It's outlined in orange.

Keynote from Intel's Raja Koduri starts with a tribute to Fran Allen, who did a lot of work with compilers (which translate human-written languages into machine code). "She was the pioneer of this idea 'no transistor left behind.'" — the title of Koduri's talk.

@Rajaontheedge

@Rajaontheedge

I wonder, with Apple now moving from Intel chips to its own Arm-design chips, whether Intel execs will be wearing Apple Watches less often when on stage. (This is Intel's @Rajaontheedge.)

Demand for compute is doubling every 3-4 months these days, Koduri says. And of course we're generating piles of data to store and process. "We need more capacity and bandwidth at every level of the memory hierarchy."

On to Moore's Law, which people have predicted for decades will die soon. Koduri: "We definitely haven't explored the full entitlement of Moore's Law."

Koduri cites Jim Keller (who recently left Intel) vision for increasing transistor density by a factor of 50, perhaps over a decade. "We firmly believe there is a lot more transistor density to come. We are a persistent bunch. We are very good at compounding gains 1% at a time."

Koduri emphasizes that we need general-purpose computing performance increase, not just specialists (read: CPUs, not just GPUs and AI accelerators)

Architecture impact = performance X generality.

One of the tributes to Jim Killer that Koduri offered: a back-of-the-envelope calculation that a rack of servers could have a million CPU cores to keep Fortnite gamers happy with their cloud gaming. #HotChips2020

Ninja programmers are great, but Intel has oneAPI initiative to write software abstraction layers to shield programmers from the complexities having of lots of processor types — CPU, GPU, AI, etc. Goal: to increase their productivity, use every transistor effectively.

Then (adapted from famed Hennessy and Patterson processor textbook): Software has fun on top of the hardware.

Koduri updates the cartoon for modern computing:

Xbox Series X processor is all about the graphics (as you might expect). #HotChips2020

Here are the hardware details of the chip for the game console.

You can push Moore's Law, but it'll cost you, Microsoft says. The Xbox chips have had about the same area across three generations, but the cost went way up, in particular with new TSMC 7nm manufacturing process.

You can push Moore's Law, but it'll cost you, Microsoft's Jeff Andrews says. The Xbox chips have had about the same area across three generations, but the cost went way up, in particular with new TSMC 7nm manufacturing process.

There's a growing gap between shader calculation needs and memory size and bandwidth, says Microsoft's Mark Grossman.

One way to fix the problem is to reduce processing for areas that are lower priority, for example because of little color-value change compared to neighboring pixels. Like so many compression/efficiency moves, it's all about throwing out data humans won't know is missing.

Small amount of extra silicon area on the Xbox Series X processor boosts AI/machine learning tasks 3x-10x for jobs like super-resolution graphics or character behavior.

Microsoft promises slick and groovy graphics for the Xbox Series X, of course. No video demo, but here's a still shot they offered up.

@hotchipsorg

@hotchipsorg

But personally, I'm more interested in this high-resolution graphic. Amazing to realize what happens when you can plop down 15.3 billion transistors onto something the size of a skinny postage stamp. That's it for me and @hotchipsorg 32 tonight.

Try unrolling a thread yourself!

Related hashtags

More from @stshank see all

Embed code for your website

Did Thread Reader help you today?