HomeQuery OptimizerBenchmarksServer Systems, ProcessorsStorage ScriptsExecStats

2020 Q3/4

For the last several years, I have written on the impact of round-trip memory latency for software characterized by pointer chasing code, i.e., b-tree index navigation. Most of this was based on observed behavior of a 2-socket Xeon E5-2680 (Sandy-Bridge) running a transaction processing workload on SQL Server 2008 R2 at the normal 2.7GHz (+turbo) and with the BIOS set to power-save for 135MHz.

A performance difference of 3X for a 20X frequency span actually makes based on a few percent of code incurring a memory round-trip averaging 115ns (90ns local and 140ns remote node). There were no options that I was aware of for server system ECC memory (I did find out the Kingston offered the E model with lower CL ~13.75ns versus the normal 14-14.5ns CL).

There are however, memory latency options for gaming systems using the Intel Core K model processors without ECC. In the fall of 2020, I acquired two desktops, one an i7-10700K and the other a i9-10850K. The standard memory non-ECC memory with CL timing 14.5ns was baseline and specialty memory timings of 10 and even 9ns were also tested.

Baseline: DDR4 2933 MT/s, CL 21 clocks, 14.32ns, memory latency 63.25ns as measured with 7-cpu.com,
G.Skill TridentZ Royal DDR4 4000, CL 17, 8.50ns, for memory latency 44.74ns (posted here),
and DDR4-3200 CL 14, 8.75ns, for memory latency 50.79ns.

The test environment was on both Windows 10 and Windows Server 2019, no apparent difference, and SQL Server 2019. The test kit is a series on index seek + key lookups in which successive leaf level pages are on different pages of table much larger than L3 cache (typically 1GB), with measurements after a warm up so that all pages are in the buffer cache (no disk accesses). In some test, even successive access of the lowest index intermediate level was to different pages.

These tested showed no different in performance with memory latency, from 63.25 down to 44.74ns. Surprisingly, performance was dependent on frequency (almost linear?) between turbo-boost disable base frequency of 3.6GHz and the all-cores turbo-boost frequency observed at 4.8GHz(?) on the i9-10850K and 3.8GHz base/5GHz turbo for the i7-10700K. I have no explanation for this observed behavior and would like to hear any explanations.

2019 Q3/4

 Heat Sinks for Server Memory (2019 Oct)

 Optane Persistent Memory (2019 Sep, preliminary)

Other people writing about Single Processor
The Next Platform Why Single-Socket Servers Could Rule the Future, April 24, 2019 Robert W Hormuth, Vice President/Fellow, CTO, Server and Infrastructure Solutions, Dell EMC
Robert's top 10 List for why 1-socket could rule the future.
    More than enough cores per socket and trending higher
    Replacement of underutilized 2S servers
    Easier to hit binary channels of memory, and thus binary memory boundaries (128, 256, 512...)
    Lower cost for resiliency clustering (less CPUs/memory...)
    Better software licensing cost for some models
    Avoid NUMA performance hit — IO and Memory
    Power density smearing in data center to avoid hot spots
    Repurpose NUMA pins for more channels: DDRx or PCIe or future buses (CxL, Gen-Z)
    Enables better NVMe direct drive connect without PCIe Switches
    (ok I'm cheating to get to 10 as this is resultant of #8)

Gartner Use Single-Socket Servers to Reduce Costs in the Data Center, 5 December 2018

PASS Summit 2018, Seattle, Nov 6-9


Rethink Server Sizing     Joe Chang   Date: Nov 7, Time: 4:45PM - 6:00PM, Room: 618
slides pdf   pptx

2018 Q3/4

 Fast DRAM (2018 Oct)



 Too Much Memory (2018 Sep)



 RISC vs. CISC (2018 Aug)



 Multi-Processors Must Die (2018 Aug)


 Intel 10nm Delay Assessment (2018-08)

2018 Q1/2

 Memory Latency (2018-04)

 TPC-E Benchmarks (2018-04)

 DRAM (Updated 2018-10),   original  DRAM (2018-03)


 SRAM as Main Memory (2018-03)

System Architecture Review
 Front-side Bus

 System Architecture Review 2016



Earlier versions of articles replaced by above.
  The Case for Single Processor (unfinished) see: Multi-Processors Must Die

  Low Latency Memory (2018-03)  Memory Latency (2018-02)  SRAM as Main Memory (2018-02)

  Rethink Server Sizing 2017 (2017-Dec)    SRAM as Main Memory (2017-Dec)

  Rethinking System Architecture (2017-Jan),   Memory Latency, NUMA and HT (2016-Dec),

  The Case for Single Socket (2016-04)


Posted on Linkedin, (but needs to be updated):
  SRAM as Main Memory Cost Benefit   Rethink Server Sizing 2017


System Architecture

System Architecture has been split into multiple sections:
  HistoricalAMD OpteronIntel QPIDell PowerEdgeHP ProLiantIBM x Series

  NUMA (never finished),   Sandy Bridge,   KnightsLanding 2016-08,

  Asymmetric Processor Cores


Older Server System material

 NEC Express5800/A1080a (2010-06),  Server Sizing (Interim) (2010-08),
 Big Iron Revival III (2010-09),  Big Iron Revival II (2009-09),  Big Iron Revival (2009-05),
 Intel Xeon 5600 and 7500 series (2010-04, this material has been updated in the new links above)

 Historical Systems (incomplete)


Other System Architecture Articles

New Items 2016-12   Memory-IO Performance (in progress),   Memory Latency, NUMA and HT 2016-12,  The Case for Single Socket (2016-04)

Additional related topics:   Amdahl Revisited 2015-05,

  Server Strategy Shift with Sandy Bridge (formerly part of Systems Architecture 2011Q3),

  Systems Architecture 2012Q4,   2011Q3,   2010Q3,   2009,
  NEC Express5800/A1080a (2010-06),

  High Call Volume SQL on NUMA (pre-SQL 2005?),

  Big Iron Revival III (2010-09). Big Iron Revival II (2009-09), Big Iron Revival (2009-05),
  Intel Microarchitecture Diagrams

I will try to sort out the material and redistribute over several articles as appropriate. For now
  Knights Landing 2016-08,   Memory-IO Performance (),   Memory Latency, NUMA and HT 2016-12,    The Case for Single Socket (2016-04)   Amdahl Revisited   and also   Cost-Based Optimizer



Onur Mutlu, Professor of Computer Science at ETH Zurich website, lecture-videos

Mark Clark, AMD A New X86 Core ... ,  AMD Memory Technology