I am usually pretty reserved with cash, but after working full-time for six months, I finally decided to spend some of my money on building a new research development server. This process was long overdue and the reason it took me so long to commit to this project was all of the new technology developed since building my last server. This “new technology” can be pretty confusing unless one specializes in computer architecture. I want to share what I have learned throughout this process, while giving some background. These are only my opinions, and I may be wrong on some things as I am not a hardware expert. I encourage you to read and learn more on your own.
If you are reading this article, I probably do not need to explain what the CPU/processor does. For high performance computing, you will want to get a CPU that is very “fast” and also has multiple cores. The definition of the word “fast” is in the eyes of the beholder and typically refers to more than just clock speed (GHz). In the constant war between AMD and Intel, I stick with Intel. AMD processors are powerful, but they seem to have more of a market with gamers. Intel is my preference, but I have not yet run into anyone that feels strongly towards AMD for high-performance computing (HPC). There are two main processor lines under Intel: standard, and Xeon. Standard processors are your run of the mill CPUs that are found in consumer desktop machines. Xeon processors are designed for non-consumer server, workstation and embedded systems use. I do not consider researchers as “consumers,” we are producers, so the Xeon family is better suited to our needs. On the other hand, you may find that a standard CPU will fit your needs for your particular research or use case. Xeon processors typically have more cache and more multiprocessing capabilities…and they are a lot more expensive. For high-performance computing, I strongly suggest Intel Xeon.
After months of research, I have concluded that multiple Intel Xeon processors are better than one Intel Core i7. As of the time of this writing, it seems that i7 processors cannot be doubled (or tripled etc.) up like Xeons can. Like the AMD, the i7 seems to be favored by gamers and those needing a richer multimedia experience.
In 2011, most CPUs in new systems have multiple cores. Each core can essentially run one process each. A system with n cores can run n processes simultaneously. Many CPUs are hyperthreading enabled, meaning that each core can actually run 2 threads simultaneously, bringing the total number of threads to 2n. But can’t the system already run multiple processes concurrently? We can run Firefox, TweetDeck, Thunderbird etc. concurrently, right? In practice, it seems that the CPU is processing multiple threads simultaneously. If we could slow down time to the micro level, one would see that the CPU works on one process at a time, then does a context switch to another process. Theoretically, this gives the illusion that the CPU is running multiple processes simultaneously.
While Intel makes great products, its inventory is a nightmare to navigate. There are several things that you must know to ballpark a particular CPU model.
- the model number (the most reliable!)
- the brand name specifies a group of CPU models satisfying similar use cases (Core [i3/i5/i7/i9], Core 2 Duo, Quad Core, Pentium, Xeon).
- the architecture/subarchitecture — specifies a type of processor within a brand, each containing many series (Nehalem, Westmere, Sandy Bridge are common ones these days)
- the chipset (not commonly referred to, examples: Tylersburg, Cougar Point, Panther Point)
- the platform which refers to a set of models (e.g. Harpertown, Jasper Forest, Gainestown, Prescott, Gulftown). Models within a series are typically only differentiated by clock speed (GHz).
- the socket type specifies the shape and size of the CPU. The CPU and the motherboard must have the same socket type (i.e. LGA1366, Socket 775)
As if this is not confusing enough, each Intel Xeon model number is prefixed with a letter for different use cases. The letter distinguishes CPUs with differing thermal dissipation power (TDP). (source)
- W stands for “Workstation” and is meant to be installed in pairs. This designation does not seem very common anymore. They typically run the fastest (clock speed) and the hottest. They require significant cooling.
- E is “mainstream (rack mount)” and the standard model of CPU. Although it is “standard,” there is nothing wrong with it performancewise, but will run hot even when idle.
- X stands for “performance” and are similar to E but provide for extra overclocking capabilities and have lower idle power draw.
- L stands for “power optimized” and are low voltage CPUs (60W or less) that are typically only used for data centers or rack servers. They typically do not come in the higher clock speeds etc.
For the Intel Xeon, model numbers indicate what configuration it is compatible with on the motherboard (source):
- 3xxx Xeons are designed to be used by themselves, as the only CPU on the motherboard.
- 5xxx Xeons are designed to be used in pairs; two CPUs on the motherboard.
- 7xxx Xeons are designed to be used in pairs, or in larger groups.
The 2 CPUs that I purchased are model Intel Xeon E5645. The Intel Xeon E5645 is part of the Gulftown platform of the Xeon family. It uses the Westmere subarchiture which is the 32 nm shrink of the Nehalem architecture spec and connects to the system bus using socket LGA1366. (This is the same architecture used for the i7-9xx series to make it more confusing) The E means that it is a “mainstream” CPU. Since it is a 5000 model, it is installed with another identical CPU on the same board.
The number of cores is important. Most chips in current desktops contain 2 or 4 cores. Higher end systems and servers may have 6, 8 or 10 cores per chip. Xeons with 8 and 10 cores per unit debuted in Q2 of 2011 and are very expensive (about $2000 for 8 cores). They also require a brand new socket type (LGA1367), which means a new, expensive motherboard. A CPU with more cores allows an application to perform several units of work per task; these processors allow higher bandwidth.
The clock speed (GHz) used to be the deciding factor for most people, until Moore’s Law broke down. Higher clock speed possibly allows a single process to complete faster. Since games typically use a limited number of threads and require quick performance, a single i7 is a good choice. The i7 has multiple cores, and also has a very high clock speed.
The cache size and speed is also important. The cache allows very high speed access to memory locations that are frequently accessed by copying the data from RAM into the CPU cache. Modern systems typically have three levels of cache: L1, L2 and L3. L1 cache is said to be the “closest” to the CPU, meaning the CPU queries the L1 cache first when performing a memory access. The L1 cache is the smallest. The L2 and L3 caches are accessed next in order, and L3 cache is larger than L2 cache. Very simply put, CPUs with larger caches (especially L1) are better.
Newer processors report CPU throughput as gigatransfers per second (GT/sec) which, like GHz, quantifies some measure of “speed.” Using GT/s, one can compute the number of bits the CPU can transfer per second as
Think of the cores vs. clock speed decision as a highway. Suppose the clock speed indicates the maximum speed limit on a single lane highway. A faster CPU corresponds to a single lane highway with a high speed limit. You will get to your destination faster. On the other hand, consider a one-lane vs. a two-lane highway, both with identical speed limits. If one lane is too busy for you, take the other lane. An increase in the number of cores increases the number of choices of lanes you can transition to. On the single-lane highway, you would need to slow down and wait for the cars in front you to move forward. By switching lanes, you may get to your destination faster, or you may not, but more driving is completed overall.