Home, Query Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage, TPC-H Studies

Parent,   desktop,   Xeon big die,   Opteron,   original Opteron page,   Xeon w/stick diagrams

Update  2019 Oct

I have not been keeping up with AMD developments. Their EPYC architecture launched few years ago. In brief, AMD is eschewing very large die in favoring of tying multiple medium size die together in a single processor package/socket.

The economics of manufacturing cost favors multiple smaller die over one large die, sufficient to offset a larger total size of the combined smaller die, in part because the signals that go off-die must be amplified, consuming some additional die are.

The other effect is that we now have NUMA in a single socket. Note: even the Intel Xeon SP has a single die NUMA option in which half the cores and one of two memory controllers form a NUMA node. For applications that have been highly optimized for NUMA and having relatively low dependence on inter-node communication, the AMD approach could have advantages, including cost.

In applications not meeting both criteria, optimization for NUMA and inter-node communication, the single large die has an advantage in being able to present a unified memory model with a die. However, any multi-socket system is inherently NUMA. So the question is whether there is a significant difference between a 2-socket Intel system with two NUMA nodes of large die processors, versus a 2-socket system AMD system in which each socket is 4 NUMA nodes for a total of 8 nodes, and some nodes are 1-hop away, others 2-hop.

In real world database transaction processing, almost noone have optimized for NUMA. Also, there is likely to be significant inter-thread communication, limiting the effectiveness of NUMA optimization, which would probably require a significant rearchitecture of both the database and application-side code. In this world, avoiding NUMA systems has significant value to the extent that any price/cost on the processor is worth it if NUMA can be avoided.

My proposal in database transaction processing is to abandon multi-socket systems, and focus significant effort in tuning to run on a single socket, even if this means a large die processor at almost any cost. Then press manufacturers to offer low-latency DRAM, again, at almost any cost.

Some links to AMD EPYC articles
AMD EPYC Naples vs Rome and vSphere CPU Scheduler Updates Oct 2019 , Frank Denneman.
AMD EPYC and vSphere vNUMA Feb 2019 , Frank Denneman.

Update  2018 Oct

AMD and Intel Comparisons

Below are some Intel and AMD Opteron processors at nominally matching process. Where possible, the same L2, L3 or combined cache size models are shown. In some cases, the number of cores are different.

ProcessIntelAMD
130nmBanias Opteron_130nm Opteron_130nm
 Banias 2003,
83mm2 1M L2
Opteron 2003,
193mm2 1M L2
90nmPrescott OpteronDC_90nm
 Prescott 2M 2005,
135mm2 2M L2
Opteron DC 2005,
199mm2, 2×1M L2
65nm ConroeBarcelona2
 Conroe 2006 Apr,
143mm2 4M L2
Barcelona 2007?,
285mm2 4×512K L2, 2M L3
45nmNehalemShanghai
 Nehalem 2008 Aug,
263mm2 4x2M L3
Shanghai 2008,
258mm2 4×512K L2 6M L3
45nm Istanbul
  Istanbul 2009,
346mm2 6×512K L2
6M L3
32nmWestmereBulldozer_8c3
 Westmere 2010 Jan,
240mm2 6x2M L3
Bulldozer 2012,
315mm2 4×2M L2, 8M L3

I am not sure if the Bulldozer image shown above is a single die or two die next to each other.

Bulldozer_8c3  

The Piledriver image shown above has four double-"cores"?

 

 

Additional notes:
Pentium Pro - 133MHz on 0.6 µm
150 on 0.50
166, 180 an 200 on 0.35 µm
256K L2 die on 0.50
512K L2 die on 0.35