AMD vs NVIDIA – Two figures that can tell a whole story

titanUpdate September ’13: AMD gets their new GPUs “Volcanic Islands” with GCN 2.0 out in October. For this reason the HD 7970’s price has dropped to €250. This shakes up some of the things described in this article.

Update June ’14: It has become clear that Titan is not a consumer device and should be categorised as a “Quadro for compute”. All consumer devices of both AMD and Nvidia show relatively low GFLOPS for dual precision.

Update July’14: Graphs updated with GTX Titan Z and R9 290X.

AMD/ATI has always had the fastest GPU out there. Yes, there were lots of times in which NVIDIA approached the throne, or even held the crown for a while (at least theoretically), but it was Radeon, at the end, the one who had the right claim.

Nevertheless, some things have changed:

  • AMD has focused more on the new architecture, making it easier to program while keeping the GFLOPS the same.
  • AMD bets on their A-series APU with integrated GPU.
  • NVIDIA has increased both memory bandwidth and GFLOPS at a steady pace.
  • NVIDIA has done the nitro-trick for double precision.

With NVIDIA GTX Titan (see three of them in the image), NVIDIA snatched victory from the jaws of defeat.

I’m not saying you should jump now to CUDA; there’s more than just GFLOPS. We should think also of costs and prevention of vendor-lockin. More particularly, I would like to show how unpredictable the market for accelerator-processors is.

Let’s take a look at the figures.

Below are the fastest consumer-targeted GPUs. As you can see, AMD has a flatline, while NVIDIA increased their performance at a fast pace.

Note that the dates are actual release dates, not the announcement dates. Also, there are a lot more differences between the dots than just the GFLOPS. There’s also architecture, memory bandwidth, memory size, PCIe-version, etc.

Update July’14: Both AMD and NVIDIA are now at 5.1 TFLOPS. AMD’s R9 290X costs $650 and the GTX Titan costs $1000+.



NVIDIA’s line for double precision makes a huge jump. No, this is not a mistake. NVIDIA decided to put double precision in consumer-GPUs. Only the Titan has it, the rest still has 1/8th of single precision.

Update July’14: AMD decided to go to 1/8th too, going to just over 600 DP GFLOPS.



For the professional accelerator market see the “answer to” series on this blog.


When it comes to cost, there is a big difference between the two competitors. This could be an effect of the vendor lock-in by CUDA, or a nice void in the market.

Radeon HD 8970 (4 GB): €550
Radeon HD 7970 Extreme edition (6 GB): €570
NVIDIA GTX Titan (6 GB): €970

… and they all have 288 GB/s memory bandwidth.

AMD Radeon 8970 XT

As you can read on the Guru3D forum, there are speculations about AMD taking back the crown. Below, a comparison between the 7970 and the 8970 XT (source) – the unofficial specs.

Higher clock speed1,050 MHzvs925 MHz
Better floating-point performance5,376 GFLOPSvs3,789 GFLOPS
Significantly higher pixel rate50.4 GPixel/svs29.6 GPixel/s
Higher texture rate168 GTexel/svs118.4 GTexel/s
Significantly more render output processors48vs32
Slightly higher effective memory clock speed6,000 MHzvs5,500 MHz
More shading units2,560vs2,048
More texture mapping units160vs128
More compute units40vs32
Higher memory clock speed1,500 MHzvs1,375 MHz

The source mentions the normal 8970, but that is a mistake. See the Official specs of 8970 here [PDF].

Titan II / Ultra

NVIDIA is known for whispering specs of upcoming products far before they launch. Take the example of 3D stacked memory for products in 2016.

But when will the Titan II arrive? Nobody knows for sure. The fact that the card will actually show up, has already been put as a controlled rumor. Now, under what name it will appear (2, II, Ultra), or what specs it will have, is also very difficult to tell. We will probably know better by late 2013 or early 2014.

What is sure is that this battle continue until the discrete GPU market vanishes.

Share your thoughts! How long you think NVIDIA can hold on to power?


Related Posts


NVIDIA ended their industry-leading support for OpenCL in 2012

...  are also available for download.This sentence "NVIDIA’s Industry-Leading Support For OpenCL" was proudly used on NVIDIA's OpenCL ...


Installing both NVidia GTX and AMD Radeon on Linux for OpenCL

...  drivers and watch out that isn't overwritten by NVidia's driver as we won't use that GPU for graphics - this is also the ...


7 things NVIDIA doesn’t want you to know

...  some time  I've tried to ignore NVIDIA's unfounded bragging, but it's impossible not to react. This is because ...  that send ...


NVIDIA’s answer to SandyBridge and Fusion

...  has Sandy Bridge, AMD has Fusion, now NVIDIA has a combination of CPU and GPU too: Project Denver. The only ...  per Watt (in ...

  • pip010

    2×9750 ~ 600EUR

  • MySchizoBuddy

    for the cost of one titan I can buy two AMDs and get 2TF DP instead of 1.3 for Titan. Sounds better to me 🙂

    • StreamHPC

      True, but when going for multi-GPU it is a total different story. You
      cannot just sum the flops and bandwidth, except in some cases (like
      ray-tracing). In an upcoming article I go into dept on multi-GPU and how
      to compare them.

      Just be happy you are not locked-in to CUDA, else you did not have this choice at all.

  • Alex

    NVIDIA is doing just great and the CUDA ecosystem is really mature. I’ve been working with their accelerators and I’m incredibly amazed by what they really can do when you know your work (i.e. you know how to squeeze performances out of them).

    • StreamHPC

      NVIDIA’s technology is great, but their PR-machine is even greater. Exactly the same things can be done with Radeon GPUs, but AMD sucks at bragging. Problem with CUDA is that it is not for the long term, as it does not support FPGAs, DSPs or any other non-NVIDIA processor.

      • Alex

        Not sure about the same things with AMD.. perhaps in the computer graphics, not in the HPC field (I’ve been working in HPC for 10 years now). By the way a good PR-machine is essential to any company out there

  • mike

    The ATI 7990 has 8200 GFLOPS, surpassing all the consumer competition

    • StreamHPC

      Which effectively is two Radeon HD7970’s. As not all GPU-algorithms are massively parallel, I focus on single cards.

      If you do want the max and use an algorithm with very localised
      operations, make a new version of using HD7990’s.

  • Garet Claborn

    I think this is a pretty good article, but I do disagree with saying the Radeon has flatlined.

    At least to me, it seems (and often has been the case) that AMD comes in fairly even spurts of new, raw performance. NVIDIA tends to play catch up and grab the lead for a few months right after they release a much less common upgrade where performance was bolstered largely by more advanced engineering.

    Personally I think they will dance back and forth for years to come. AMD will tend to jump and plateau more as NVIDIA will skyrocket after long waits.

    I use both technologies depending. Radeon’s price point is great as noticed =) Of course CUDA’s maturity is a huge selling point on NVidia cards and alot of the extensions just work better. Still I think they are both in the game for now =P

    • StreamHPC

      AMD had put all their engineering efforts in developing their GCN architecture – so you do see quite some improvements, but not in theoretical GFLOPS (hence the flat line). I’m very happy they chose this path!

      After running this company for 3.5 years I can only think we’re just getting more surprises the coming years. Redefinition after redefinition of what a processor is and how software should be defined, in search of tackling the problem of efficient parallel processing.

      • Garet Claborn

        I think I see where you’re coming from on that, as you had somewhat mentioned AMD trying to improve programmer efficiency per GFLOP vs raw GFLOPS. Do you think AMD’s push signals a long-term shift in their focus?

        As a developer, I have become rather fond of the APU so I can see why performance can’t always beat interface accessibility. Right tool, right job, all that.

        If that’s the case, then I can see why it’s pretty logical to think of the flatline as more persistent than I’m used to =)