#### **Performance Measurements**

Improving Latency and Bandwidth of Your DDR4 System

Barbara Aichinger Vice President New Business Development FuturePlus Systems Corporation

#### **MEMCON 2014**

### Outline

- Performance measurements or Analytics?
  - Events on every cycle @1867MT/s that's is ~ 1 billion events per second
- Work Harder or Smarter?
  - Should we go faster or better use what we have?
- What should we measure to know if we are working hard or working smart?
  - Power Management, Latency, Bandwidth
  - New Metrics



FuturePlus Systems

# How do I know if I'm working smart?

- Measure it!
- Power Management, Latency, Bandwidth
- But wait...there's more!
  - Page Hit Analysis
  - Multiple Open Banks
  - Bank Group Analysis
  - Bank Utilization
  - Boot Analysis



FuturePlus Systems

### The Target used

- Asus X99
  - DDR4 1867, Crucial/Micron DIMMs 2Rx8 8Gb
  - Running Google StressApp memory test







MEMCON 2014

FuturePlus Systems

#### Work Smarter not Harder

- For Performance metrics the DDR Detective<sup>®</sup> uses counters instead the traditional trace memory
  - To capture a second of DDR4 traffic would take 4.5Gbytes of logic analyzer/protocol analyzer trace depth \$\$\$\$!
    - 1 hour = 270 Gbytes of trace depth and then time to sift through it and post process!
  - By using large counters and counting events and the time between events we can achieve hours and days worth of metrics with no trace buffer memory and with no time consuming post processing



FuturePlus Systems

## Power Management Metrics

- Idle
- Active
- PreCharge Power Down
- Active Power Down
- Self Refresh
- Max Power Down
- DLL Enable



**MEMCON 2014** 

FuturePlus Systems

#### **Power Management**

#### while running StressApp





**MEMCON 2014** 

#### FuturePlus Systems

### System Idle





**MEMCON 2014** 

#### **FuturePlus Systems**

#### **Power Management**

- ~50M servers Servers World Wide
- Each Server averages 16-24 DIMMs
  - 800M to 1.2B DIMMs
- Even a small power savings per DIMM can add up

Every time **Facebook**'s data center engineers figure out a way to **reduce** server consumption **by a single watt**, the improvement, at Facebook's scale, has the potential to add **millions of dollars** to the company's bottom line.



Yevgeny Sverdlik Editor in Chief Data Center Knowledge



**MEMCON 2014** 

FuturePlus Systems

#### Latency

- Several Jedec Paramters apply:
  - RD to WR same rank tSR\_RTW
  - RD to PRE/PREA same Rank tRTP
  - WR to PRE(SB) or PREA (SR) tWR
  - Read to Read different Rank tDR\_RTR
  - Read to Write different Rank DR\_RTW
  - Write to Read different Rank tDR\_WTR
  - Write to Write different Rank tDR\_WTW



FuturePlus Systems

#### Measure it!





#### **FuturePlus Systems**

**MEMCON 2014** 

#### Latency Measurements

| V#  | Parameter | Description             | Spec | Measured |
|-----|-----------|-------------------------|------|----------|
| V2  | tSR_RTW   | RD to WR same Rank      | 8    | 10       |
| V11 | tRTP      | RD to PRE same Rank     | 8    | 8        |
| V12 | tWR       | WR to PRE SB or PREA SR | 31   | 31       |
| V53 | tDR_RTR   | RD to RD diff Rank      | 5    | 6        |
| V57 | tDR_WTR   | WR to RD diff Rank      | 3    | 6        |
| V59 | tDR_WTW   | WR to WR diff Rank      | 5    | 8        |

**MEMCON 2014** 



FuturePlus Systems

#### **Intervening Commands**

| WaveForm | Violation      | ns Setup | Storage Q | ualification | Trigger    | Mode Re     | egister Set    | Configure | ation      | /iolations Cour | nts   |    |              |          |             |          |        |            |
|----------|----------------|----------|-----------|--------------|------------|-------------|----------------|-----------|------------|-----------------|-------|----|--------------|----------|-------------|----------|--------|------------|
| Sync N   | lotes          |          |           |              |            |             |                |           |            |                 |       |    |              |          |             | - Q      | 1      | -          |
|          |                | Bank /   | Address=  |              | Trigg      |             |                | ) states  |            |                 |       |    |              |          |             |          |        |            |
|          |                | 1 nS     |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          | Time           |          | <u></u>   | X52A         | 528        | <u>χ5γς</u> | _ <b>∑</b> 52D | X52E      | 52F        | <u></u>         |       | 31 | 532          | X533     | <u> </u>    | X535     | X536   | X          |
|          | ommand         | DES      |           | WR-R0        | PRE-R0     | DES         |                |           | PRE-       | RO DES          |       |    | WR-R1        | DES      | ACT-R0      | DES      | WR-R1  |            |
|          | RIGGER         |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          | A_VALID        |          |           | <br>         |            |             |                |           |            |                 |       |    | (2           |          |             |          | <br>   |            |
|          | Address        |          |           |              |            |             |                |           | \ <u>1</u> |                 |       |    | \            |          |             |          |        |            |
|          | Address        |          |           | /<br>/103E0  | √83E0      |             |                |           |            |                 |       |    | (10228       |          | <b>5A34</b> |          | X10220 |            |
|          | RAddr          |          |           | χ<br>5933    | χ5         |             |                |           |            |                 |       |    | <b>5A</b> 34 | χ5       | X5A34       | χ5       | X5A34  | ×.         |
|          | CAddr          |          |           | JE0          | , <u> </u> |             |                |           |            |                 |       |    | 228          |          | 234         |          | 220    |            |
|          | PV             |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
| Vio      | lationID       |          |           |              |            |             |                |           |            |                 |       |    | 59           | λ        |             |          |        |            |
|          | R0 RPS         |          |           | Y            | ACTIVE     |             |                |           |            |                 |       |    |              |          |             | ACTIVE   | X      | <b>X</b> = |
|          | R1 RPS         |          |           | X            | ACTIVE     |             |                |           |            |                 |       |    |              | ACTIVE   | X           | ACTIVE   | Χ      |            |
|          | R2 RPS         |          |           |              | <u>\</u>   |             |                |           |            |                 |       | /  | /            | <u>\</u> | _/          | _\       |        |            |
|          | R3 RPS         |          |           |              | λ          |             |                |           |            |                 |       | /  | <            | λ        | _{          | _λ       |        | $\Delta$   |
|          | ODT1           |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          | ODT0<br>RESETn |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          | ALERTI         |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          | PAR            |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          |                |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          |                |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          |                |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |
|          | 1315 ≑         |          |           |              | Be         | egin to E   | nd = 5,4       | 415 state | s [5.788   | 635 µS]         | Begin |    | ✓ End        |          | •           | <b>R</b> | < >    |            |
| 1        |                |          | I         |              |            |             | - I - I        |           |            | ••              |       |    |              |          |             | <u> </u> |        |            |
|          |                |          |           |              |            |             |                |           |            |                 |       |    |              |          |             |          |        |            |



FuturePlus Systems

**MEMCON 2014** 

#### Latency Measurements

| V#  | Parameter | Description                | Spec | Measured |
|-----|-----------|----------------------------|------|----------|
| V1  | tCCD_L    | RD to RD Same Bank Group   | 5    | 6        |
| V3  | tCCD_L    | WR to WR Same Bank Group   | 5    | 6        |
| V4  | tCCD_S    | RD to RD diff Bank Group   | 4    | 4        |
| V5  | tCCD_S    | WR to WR diff Bank Group   | 4    | 4        |
| V6  | tRRD_L    | ACT to ACT Same Bank Group | 5    | 5        |
| V7  | tWTR_L    | ACT to ACT diff Bank Group | 4    | 4        |
| V9  | tWTR_L    | WR to RD Same Bank Group   | 22   | 23       |
| V10 | tWTR_S    | WR to RD Diff Bank Group   | 17   | 19       |



FuturePlus Systems

Power Tools for Memory Bus Analysis

**MEMCON 2014** 

#### Latency

- Good designs operate on the edge of the spec
- Architectural tradeoffs will occur
- Do I need margin?
  - Design for the worst case and buy quality parts

**MFMCON 2014** 



FuturePlus Systems

#### Bandwidth

- Overhead
  - Any use of the bus other than a Read or a Write
  - Command Bus Utilization
- Data Bus
  - Utilization: the % of the time that Read or Write Data is being transferred
  - Bandwidth: the amount of data transferred per second



FuturePlus Systems

#### **Command Bus Utilization**





FuturePlus Systems

Power Tools for Memory Bus Analysis

**MEMCON 2014** 

### DES,REF, ZQCL





FuturePlus Systems

**MEMCON 2014** 

#### Summarize Command Bus Utilization

Strategic Solutions Partner



#### **Data Bus Utilization**





**MEMCON 2014** 

FuturePlus Systems

#### **Data Bus Utilization Summary**



#### Data Bus Bandwidth

#### Mbytes transferred in 1 second



#### **Total Bandwidth**





**MEMCON 2014** 

**FuturePlus Systems** 

### Insight beyond the basics

- Page Hit Analysis
  - Read Miss
  - Write Miss
  - Unused
- Multiple Open Banks
  - Open Banks make for faster access if your going there...performance hit if your not
  - Power hit when banks are open
- Bank Group Analysis
  - New for DDR4 back to back access to same bank performance hit
  - Faster to go back to back to different bank groups



**MEMCON 2014** 

FuturePlus Systems

### Page Hit Analysis





FuturePlus Systems

Power Tools for Memory Bus Analysis

**MEMCON 2014** 

### Page Hit by percentages

Strategic Solutions Partner

**RD/WR Page Hit/Miss** 



#### **Multiple Open Banks**

How many are open at any one time





FuturePlus Systems

Power Tools for Memory Bus Analysis

**MEMCON 2014** 

### Bank Group Access Analysis

- tCCD\_L
  - Takes longer for back to back RD/WR accesses to the same bank group
- tCCD\_S
  - Can reduce latency by going to different bank groups



**MEMCON 2014** 

FuturePlus Systems

#### **Bank Group Access Analysis**

*Relative to the previous transaction how many times did the following transaction go to the same/different bank group* 





**MEMCON 2014** 

FuturePlus Systems

### **Bank Group Access Analysis**

by Percentage





FuturePlus Systems

Power Tools for Memory Bus Analysis

**MEMCON 2014** 

#### But Wait there's more!

- Bank Utilization
  - What happens during a chip kill or page retirement scenario?
  - How does the traffic reallocate?
  - What are the performance implications?
- Do I have system hot spots?
  - Row Hammer (excessive Activates)
- Fast Boot
  - Why does the system take so long to boot?



**MEMCON 2014** 

FuturePlus Systems

#### **Bank Utilization**

#### Number of cycles the banks are open





FuturePlus Systems

**MEMCON 2014** 

#### Bank Utilization When system sitting idle



**MEMCON 2014** 



#### FuturePlus Systems

#### **Boot Analysis**





**MEMCON 2014** 

**FuturePlus Systems** 

#### Advancing the State of the Art

- Memory Controller/System Architecture
  - Can this insight lead to better designs?
  - Dynamic architecture based on workload?
- Which software to run that stresses the system the best and shows architectural flaws? (benchmarking)
  - Looking for feedback from the industry...we can test using the DDR Detective<sup>®</sup>



FuturePlus Systems

### Summary

- Power Management, Bandwidth, Latency
- NEW Metrics:
  - Page Hit Analysis
  - Multiple Open Banks
  - Bank Group Analysis
  - Bank Utilization
  - Boot Analysis
  - New Measurements give insight into new designs and better architectures



**MEMCON 2014** 

FuturePlus Systems

#### **Contact Information**

Barbara Aichinger Vice President New Business Development FuturePlus Systems <u>Barb.Aichinger@FuturePlus.com</u> www.FuturePlus.com

Check out our new website dedicated to DDR Memory! <u>www.DDRDetective.com</u>



MEMCON 2014

FuturePlus Systems