Important: this article was written before Intel “Haswell” and AMD “Richland” architectures came out.
So you want to start developing for OpenCL? When you focus on developing OpenCL for X86, you have these three options: CPUs, GPUs and CPUs with and embedded GPU. This article is for you and represents the current state of hardware – if you want the best hardware for your specific algorithm, the below information is probably not sufficient.
In 2013 we focus on 3 groups: servers/cloud (FirePro, Tesla, XeonPhi), workstations (discussed here), low-power devices (SoCs) and special accelerators (FPGAs and DSPs). This article does not discuss high-end accelerators of a few thousands of Euro, which are laid out in here.
Before reading on, you need to set the goal for your workstation.
- If you want to learn the basics of OpenCL-programming, first check if your current machine has OpenCL-support.
- If you need more processing power, be sure you select the right hardware for the job. Don’t buy the most expensive hardware (FirePro, Tesla or XeonPhi), but take your time to find out which hardware supports your algorithms best. Feel free to ask us.
- If you want to make sure your software works on various types of accelerators, you can choose between:
- swapping PCIe-cards – disadvantage is the drivers-hazzle and time-consumption.
- more accelerators in one machine – disadvantage is that only GPU 1 can do OpenGL/DirectX.
- identical machines with different accelerators – disadvantage is the price.
- If you want to focus on multi-GPU development, you need:
- or enough power-supply and the motherboard supports many lanes,
- or buy a videocard with two GPUs.
This article has the goal to help you with buying a good machine for OpenCL-development. Prices are of January 2013. If you think I make the wrong suggestions, please give feedback via the comments.
My contacts at various companies can tell: I want to stay independent no matter what. No deals have been made nor was there any outside influence, except the friendly people of the local computer shops. I was surprised I ended up with suggestion so much AMD hardware, that I felt quite uncomfortable with it – I finally decided to keep to my first conclusions and leave the comments completely open.
For memory it’s easy: buy the fastest available memory – then find a motherboard which supports this memory. And lots of them: 8GB (especially under mem-hungry Windows) plus the total amount of GPU-memory in the system (for pinned memory). 8GB of 2400MHZ DDR3 is €60 to €100. See also Tomshardware why you must buy the fastest memory available. Since harddrives are a bottle neck, extra memory can be used for buffering. It is ok to spend the same amount of money on memory as on the CPU.
Recently I was remembered again that normal harddisks are a huge bottle-neck. If PCIe 2.0 x16 can do 6GB per second, then reading the input-data at 120-
Z, H, Q, B and P chipsets. So when you buy “a motherboard for Ivy Bridge” and focus on specific characteristics like number of PCIe-ports, maximum memory, etc, then you might get the wrong card. In short: go for a Z-chipset. Also there are several numbers per series – in the 7X-series you have the 77, 75, 73 and 71. Simply put: higher = more features, but different features than when using the letters. This makes it a big mess, as it forces you to find out exactly what you need. I suggest to go for safe and buy a Z77. See this table of some feature-differences of various chipsets, and this overview for what Intel has to say about the Z77 chipset.
You might think you don’t need all these overclocking options – it is not about actually using it, but about the quality of components that comes with it. Also do some chipsets not support the embedded GPU – and that is just something which is very interesting for OpenCL-development.
But we’re not finished! We have motherboards by Asus, ASRock, Biostar, ECS, Intel, Gigabyte and MSI – and they all have different features, although limited by the Intel-chipset. Altogether it means that you need to do some preparation to get a good one – Tom’s Hardware has three articles on Z77-motherboards: budged, mid-price, higher-priced.
For current GPUs you need a motherboard with at least PCIe 2.0 x16 or PCIe 3.0 x8 per GPU. If you want more than one GPU in your system, you need to check the configurations. You must have two full PCIe 2.0 x16 or PCIe 3.0 x8 to avoid a bottleneck.
A-series processors go with FM2-socket (notice the F), and FX-CPUs go with AM3-sockets. A hobby of AMD is to rename products several times. For who does not know the A-series, it was formerly know as Fusion. There are three chipsets: the A55, A75 and A85. The A55 is budget and has a lot less and should be avoided. The difference between A75 and A85 is that the latter has 2 more SATA.
For the FX-processor we currently have reached the 9-series chipset (AM3+ socket). There are 3 main types: 70, 80 and 90 – I could not find real differences between the chipsets. More interesting are the descriptors F, X and G. F = more PCIe-lanes, X = ATI CrossFireX support, G = ATI Radeon HD 4250 on motherboard. As you probably do not need a Radeon HD 4250 nor need CrossFireX, you can focus solely on the “F”. For more info, check this article at Andandtech. The same will apply for the upcoming 10-series, which focuses on nex generation SteamRoller-architecture CPUs.
Like Intel, pay close attention to the differences by the motherboard-brands.
The big disadvantage is that the chipset does not support PCIe 3.0. Opinions vary and since most test show hardly a few percent slow-down, I’d suggest you focus on other things.
Central Processor Unit
Intel Desktop Processor
Currently we have the Intel Ivy Bridge, a CPU + GPU in one. There are two GPUs available: the HD2500 and the HD4000. The number of GFLOPS of the first is even lower than the CPU, so get one with a HD 4000. This is on all the i7’s and the i5-3570K (€250). I found that the i7-3770K (€300) is a popular choice as a successor to the 2600K. Notice that you currently can only use OpenCL on the HD 4000 when using Windows.
Intel loves variations, most notably K, S and T. K stands for the easily overclockable processor and are the best choice. S is lower-clocked than the normal, but focuses on performance. the T stands for lower power usage. See this overview for all explanations.
AMD Desktop processor with embedded GPU
Currently I suggest one processor in this section: AMD A10-5800K (€125).
I already came to the conclusion that AMD doesn’t know how to present their products well and chose to mainly compete on price. Really: for OpenCL you are better off with an AMD A10-5800K processor:
- Much faster embedded GPU
- Also 4 cores.
- Cheaper: AMD thinks that this is the only reason people buy their products.
- FMA3 and XOP support on the CPU. (I’m not sure about actual support by OpenCL-drivers – I need to do tests)
The GPU is currently the fastest embedded GPU. The CPU is comparable to Intel i3.
AMD Desktop processor
The GPU-less line of AMD is the FX-processor. It uses the extra space for more cores. Theoretical GFLOPS of the AMD FX 8350 (€180) is 256, which is higher than the i7. I’ve seen many discussions whether the i7-3xxx or the FX-8xxx is better, but for OpenCL-devs it is more important that you have 8 AVX vector-units of 256 bits instead of 4.
Understand that the high-end i7 CPUs are faster in benchmarks.
It is slightly out of scope, but I do want to mention server processors as they are quite more powerful than dekstop processors. Below is just a quick peek on one specific advantage of server-processors – please consult your supplier when you’re interested. Notice that these processors are much more expensive and have specific usages in mind, such as virtualisation or multiple processors on one motherboard. If you’re interested in servers-hardware, you’ll love this article “A Dual Processor Motherboard through a Scientist’s Eyes“.
Intel Xeons have larger cache sizes and can therefore perform much better – this helps a lot with data-intensive OpenCL-applications, so worth the extra . Another notable extra is the number of processors supported on one board. The Xeon E5 and E7 processors that have an embedded HD P4000 (P stands for ECC Memory Support) are the ones with “V2” at the end (not “LV2”!). The Xeons have 8 to 10 cores.
AMD Opterons have the same advantage: much more cache. In 6300 series you can have up to 16 cores. There are no AMD server CPUs with embedded GPUs.
With AMD finally getting their tools and drivers in better shape, Radeons are a better choice for compute than last year.
For single GPU performance the Radeon HD 7970 GHz edition (€400) is the best choice with 1TFLOPS double precision and 4.3 TFLOPS single precision. For lower prices, choose between 7870 GHz (€200), 7950 (€270) and 7970 (€330). Difference between 79xx and 78xx is the amount of memory; 3GB vs 2GB. If you find cheaper cards, be sure it’s GDDR5 and not DDR3.
Know that the HD 69XX series are also very powerful, but only the 6950 ( and 6970 (€350) support double precision and have 2GB GDDR5 – second had GPUs are a great way to just test GPUs, but you need to be lucky.
The AMD Radeon HD 7990 (€900) is a dual GPU, but it takes 3 (THREE!) slots and lots of power. I totally agree with this conclusion at Tom’s hardware. Due to the price, I’m not sure if you’re not better off with two single-GPU videocards.
It is a pity, but as NVidia is focusing entirely on CUDA for high-end cards, I cannot suggest a good system for OpenCL-development on NVIDIA-hardware. I’ll put more info if they put back the removed tools and samples, and finally start supporting OpenCL 1.2. Last time I suggested to have a system with two GPUs (one by NVIDIA and one by AMD), but now I suggest you spend your money on other things.
If you really want to use NVIDIA, get a GTX 680 (€450). If you want to try dual-GPU OpenCL, the GTX 690 (€920). A budget choice is the GTX 660 TI (€250). Be sure you buy a good brand and test them well, as it seems that GTX-cards are tested for game-performance and not for compute-precision.
Conclusion & suggestions
Before I give suggestions what to buy, this is the order of how I would spend my money on a new system:
- SSD & RAID
- Discrete GPU
- (Embedded GPU)
- CPU (Yes, I put CPU at the last place)
Memory: fastest available, 16GB. SSD: fastest that could hold your data (or buffer it). Motherboard is where all the time will get in, but I’m sure it’s worth it.
For the single GPU it is also an AMD, now NVIDIA backs out of “budget compute”. For dual-GPU I cannot make a good suggestion.
Even though I have an Intel-processor in my main machine, I suggest to get an AMD processor. You get or 4 CPU-cores extra or a powerful embedded GPU. If you need PCIe 3.0, then Intel is the only choice.
Things might change when Intel releases Haswell in Q2 this year, and NVIDIA feels pressured and gets OpenCL back up. That’s the fun of having a standard: now it’s all AMD and half a year later it might be completely different.
Important: take your time to buy a new workstation – start with defining your goals. Even though I had OpenCL compute in mind when writing this article, my secondary goals probably differs from yours.