# **Advanced Systems Lab**

Spring 2022 Lecture: Architecture/Microarchitecture and Intel Core

Instructor: Markus Püschel, Ce Zhang TA: Joao Rivera, several more

Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich

# Organization

Research project: Deadline March 11th

Finding team: <a href="mailto:fastcode-forum@lists.inf.ethz.ch">fastcode-forum@lists.inf.ethz.ch</a>

# Today

Architecture/Microarchitecture: What is the difference?

In detail: Intel Skylake

Crucial microarchitectural parameters

Peak performance

**Operational intensity** 

Brief: Apple M1 processor

### Definitions Architecture (also instruction set architecture = ISA): The parts of a processor design that one needs to understand to write assembly code *Examples:* instruction set specification, registers Some assembly code ipf: *Counterexamples:* cache sizes and core frequency xorps %xmm1, %xmm1 xorl %ecx, %ecx .L8 jmp Example ISAs .L10: movslq %ecx,%rax incl %ecx **x86** movss (%rsi,%rax,4), %xmm0 mulss (%rdi,%rax,4), %xmm0 MIPS POWER addss %xmm0, %xmm1 .L8: SPARC cmpl %edx, %ecx jl .L10 ARM movaps %xmm1, %xmm0 ret

3

| <b>MMX:</b><br>Multimedia extension              | Intel x86                  | Processors (subset)                                   |      |
|--------------------------------------------------|----------------------------|-------------------------------------------------------|------|
| <b>SSE:</b><br>Streaming SIMD extension          | x86-16                     | 8086<br>286                                           |      |
| AVX:<br>Advanced vector extensions               | x86-32                     | 386<br>486<br>Pentium                                 |      |
| Backward compatible:<br>Old binary code (≥ 8086) | MMX<br>SSE<br>SSE2<br>SSE3 | Pentium MMX<br>Pentium III<br>Pentium 4<br>Pentium 4E |      |
| runs on newer processors.                        | x86-64                     | Pentium 4F<br>Core 2                                  | time |
| New code to run on old processors?               | SSE4                       | <i>Penryn</i><br>Core i3/5/7                          |      |
| Depends on compiler flags.                       | AVX<br>AVX2                | Sandy Bridge<br>Haswell                               |      |
|                                                  | AVX-512                    | Skylake-X                                             |      |
|                                                  |                            |                                                       |      |
|                                                  |                            |                                                       | 5    |







# Definitions

Microarchitecture: Implementation of the architecture

*Examples:* Caches, cache structure, CPU frequency, details of the virtual memory system

# Examples

- Intel processors (<u>Wikipedia</u>)
- AMD processors (<u>Wikipedia</u>)







































# **Firestorm Microarchitecture**

### Integer ports:

- 1: alu + flags + branch + addr + msr/mrs nzcv + mrs
- 2: alu + flags + branch + addr + msr/mrs nzcv + ptrauth
- 3: alu + flags + mov-from-simd/fp?
- 4: alu + mov-from-simd/fp?
- 5: alu + mul + div
- 6: alu + mul + madd + crc + bfm/extr

## Load and store ports:

- 7: store + amx
- 8: load/store + amx
- 9: load 10: load

### FP/SIMD ports:

11: fp/simd 12: fp/simd 13: fp/simd + fcsel + to-gpr 14: fp/simd + fcsel + to-gpr + fcmp/e + fdiv + ...

| Instruction | Latency<br>[cycles]       | Gap<br>[cycles/issue]                           |
|-------------|---------------------------|-------------------------------------------------|
| add         | 3                         | 0.25                                            |
| mul         | 4                         | 0.25                                            |
| div         | 10                        | 1                                               |
| load        |                           | 0.33                                            |
| store       |                           | 0.5                                             |
|             | add<br>mul<br>div<br>load | [cycles]    add  3    mul  4    div  10    load |

Latency and gap of FP instructions in double precision. The numbers are the same for scalar and vector instructions.

### This information is based on black-box reverse engineering.

https://dougallj.github.io/applecpu/firestorm.html

# **Icestorm Microarchitecture**

### Integer ports:

- 1: alu + br + mrs 2: alu + br + div + ptrauth
- 3: alu + mul + bfm + crc

# Load and store ports:

# 4: load/store + amx

5: load

### -----

### FP/SIMD ports: 6: fp/simd

7: fp/simd + fcsel + to-gpr + fcmp/e + fdiv + ...

| Instruction                 | Latency<br>[cycles] | Gap<br>[cycles/issue] |
|-----------------------------|---------------------|-----------------------|
| add                         | 3                   | 0.5                   |
| mul                         | 4                   | 0.5                   |
| div (scalar)<br>div (2-way) | 10<br>11            | 1<br>2                |
| load                        |                     | 0.5                   |
| store                       |                     | 1                     |

Latency and gap of FP instructions in double precision. The numbers are the same for scalar and vector instructions except for div.

This information is based on black-box reverse engineering. https://dougallj.github.io/applecpu/icestorm.html

© Markus Püschel ETH Computer Science Swiss Federal Institute of Technology Zurich

# **Apple's Microarchitectures**

# Example: A series (used in iPhones and iPads)

| Year | Microarchitecture   | Technology |
|------|---------------------|------------|
| 2013 | Cyclone             | 28 nm      |
| 2014 | Typhoon             | 20 nm      |
| 2015 | Twister             | 16 nm      |
| 2016 | Hurricane, Zephyr   | 10 nm      |
| 2017 | Monsoon, Mistral    | 10 nm      |
| 2018 | Vortex, Tempest     | 7 nm       |
| 2019 | Lightning, Thunder  | 7 nm       |
| 2020 | Firestorm, Icestorm | 5 nm       |
| 2021 | Avalanche, Blizzard | 5 nm       |

Firestorm, Icestorm are the only ones currently used in M series processors (used for MacBook, iMacs and iPad Pro)

https://en.wikipedia.org/wiki/Apple\_silicon

# <section-header><section-header><section-header><section-header><section-header><section-header><section-header><section-header>