AMD updates the FirePro S10000 to 12GB and passive cooling

203061_FirePro_S10000Passive_AngleLet the competition on large memory GPUs begin!

Some algorithms and continuous batch processes will have the joy of the extra memory. For example when inverting a large matrix or doing huge simulations, you need as much memory as possible. or to avoid memory-bank conflicts by duplicating data-objects (possible only when the data is in memory for a longer time to pay for the time it costs to duplicate the data).

Another reason for larger memories is dual precision computations (this one has a total of 1.48 TFLOPS), which doubles memory-requirements. With Accelerators getting better fit for HPC (true support for IEEE-754 double precision storage format, ECC-memory), memory-size becomes one of limits that needs to be solved.

The other choice is swapping on GPUs or to use multi-core CPUs. Swapping is not an option as it nulls all the speed-up. A server with 4 x 16-core CPUs are as expensive as one Accelerator, but use more energy.

AMD seems to have identified this as an important HPC-market therefore just announced the new S10000 with 12GB of memory. To be mailed at AMD-partners in January, and on the market in April. Is AMD finally taking the professional HPC market serious? They now do have the first 12GB GPU-accelerator built for servers.

Old vs New

Still a few question-marks, unfortunately

Functionality FirePro S10000 6GB FirePro S10000 12GB
GPU-Processor count 2 2
Architecture Graphics Core Next Graphics Core Next
Memory per GPU-processor 3 GB GDDR5 ECC 6GB GDDR5 ECC
Memory bandwidth per GPU-processor 240 GB/s per GPU 240 GB/s per GPU
Performance (single precision, per GPU-proc.) 2.95 TFLOPS per GPU 2.95 TFLOPS per GPU
Performance (double precision, per GPU-proc.) 0.74 TFLOPS per GPU 0.74 TFLOPS per GPU
Max power usage for whole dual-GPU card 325 Watt 325 Watt (?)
Greenness for whole dual-GPU card (SP) 20.35 GFLOPS/Watt 18.15 GFLOPS/Watt
Bus Interface PCIe 3.0 x16 PCIe 3.0 x16
Price for whole dual-GPU card $3500 ?
Price per GFLOPS (SP) $0.60 ?
Price per GFLOPS (DP) $2.43 ?
Cooling Active (!) Passive

The biggest differences are the doubling of memory and the passive cooling.


Biggest competitor is the Quadro K6000, which I haven’t discussed at all. That card throws out 5.2 TFLOPS using one GPU, being able to access all 12GB of memory via a 384-bit bus at 288 GB/s (when all cores are used). It is actively cooled, so it’s not really fit for servers (like the S10000, 6GB version). The S10000 has a higher bandwidth, but cannot access only half the 12GB from one core at full speed. So the K6000 has the advantage here.

Intel is planning to have 12GB and 16GB XeonPhi’s. I’m curious to more benchmarks of the new cards, as the 5110P does not have very good results (benchmark 1, benchmark 2). It compares more to a high-end Xeon CPU than a GPU. I am more enthusiastic about the OpenCL-performance on their CPUs.

What’s next on this path?

A few questions I asked myself and tried to find answers on.

Extendible memory, like we have for CPUs? Probably not, as GDDR5 is not designed to be upgradable.

Unified memory for multi-GPUs? This would solve the disadvantage of multi-die GPU-cards, as 2, 4 or more GPUs could share the same memory. A reason to watch HSA hUMA‘s progress, which now specifies unified memory access between GPU and CPU.

24GB of memory or more? I’ve found below graph to have an idea of the costs of GDDR-memory, so it’s an option. These prices are of course excluding supplementary parts and R&D-costs for getting more memory accessible to the GPU-cores.

GPU-parts pricing table
GPU-parts pricing table – Q3 2011

At least the question we are going to get answered now: is the market which needs this amount of memory large enough and thus worth serving.

Is there more need for wider memory-bus? Remember that GDDR6 is promised for 2014.

What do you think of a 12GB GPU? Do you think this is the path that distinguishes professional GPUs from desktop-GPUs?

Related Posts


Improving FinanceBench for GPUs Part II – low hanging fruit

We found a finance benchmark for GPUs and wanted to show we could speed its algorithms up. Like a lot! Following the initial work done in porting  ...


The Art of Benchmarking

How fast is your software? The simpler the software setup, the easier to answer this question. The more complex the software, the more the answer will ...


Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL

...  of participants - first come, first served. We'll share updates on Twitter, on LinkedIn and in updates in this blog-post. Looking ...


Problem solving tactic: making black boxes smaller

We are a problem solving company first, specialised in HPC - building software close to the processor. The more projects we finish, the more it's clea ...