7 things NVIDIA doesn’t want you to know

nvidia-comes-around
What goes around, comes around. NVIDIA’s marketing cannot continue on this path forever.

For some time I’ve tried to ignore NVIDIA’s unfounded bragging, but it’s impossible not to react. This is because NVIDIA ranks high on the list of tech-companies that send out wrong marketing messages.

Of the whole list, my personal favourite is number #6. Who would have thought that the one presentation everybody was talking about, was not about an invention they did themselves? Which from the list is your favourite public secret? Share with the other readers in the comments.

1. NVIDIA can’t deliver high-end Maxwells in 2014

Edit: To my surprise they did manage to get a high-end gamers GPU in September, the GTX 980. I expected it a few months later, based on my sources. Therefore a single GPU, Maxwell-based Tesla accelerator could be delivered earlier than expected. Let’s see what hits the stores this year.

High-end Maxwell GPUs are being delayed and there is no ETA. NVIDIA is very quiet on it, while the launch date of TFLOPS Maxwell GPUs have even postponed to Q1 2015. Some rumours are even saying that Tesla-cards with Maxwell don’t even hit the market.

The roadmap now shows Maxwell in 2014, which was only delivered for low-end GPUs on 28nm. To get to the promised GFLOPS/Watt for high-end GPUs, 20 nm is needed. And 20nm is exactly what is not available (when you’re not Intel). Read all on GPUs + 20 nm issues here.

pascal-scaling

Another thing is the disappearing of featured for Maxwell. For instance, Maxwell currently doesn’t have much compute-related promises anymore. The big thing would have been unified virtual memory, but that was already put on Kepler. New technologies like GDDR6 probably becomes available in 2015 – this would be a huge investment, as stacked memory also comes available in 2016. In short: most features of Maxwell have gone to Kepler already or have been delayed to Volta.

NVIDIA doesn’t want you to know that Maxwell will be mostly a variant of Kepler on 20nm, and for a big step forward you should wait for Volta in 2016.

2. Profit per GPU is huge

It is a secret that NVIDIA’s profit margin on Kepler is much higher than 54%. Investors love that – you have to pay it. Guesses are that for the TITAN the profit margin is over 85% and the TITAN Z is even being over 90%. Compared to the industry, that’s a huge margin!

While a lot of money goes into development of tools, even more goes into giving away free GPUs to the researchers. It’s basic psychology that people like balance and therefore it works really well to give away presents. Those researchers then want their university or institute to buy a cluster full of CUDA-cards they got for free. You can do the math.

The main reason why NVIDIA can charge their customers so much, is (and I love to say this): the vendor lock-in by CUDA. If your software is all CUDA, then it might be cheaper to upgrade to the latest Tesla-cards than to buy better priced alternatives and hire StreamHPC to port the code to performing OpenCL. The pain only grows when the competition can deliver faster and cheaper.

NVIDIA doesn’t want you to know what they tell investors.

3. CUDA is not performance portable

Notice that NVIDIA doesn’t attack OpenCL anymore on performance-portability? This is simply because CUDA now needs to support several generations of NVIDIA hardware, from the 2014 Tegra K1 to the 2011 Tesla M2090. Running CUDA on CPUs hasn’t taken off like OpenCL on CPUs, mostly because it was hard to say both “CUDA is portable” and “CUDA runs on CPUs” at the same time. Point being that CUDA is as performance-portable as OpenCL, if you target comparable devices.

Below is one table from CUDA Wikipedia-page showing the differences per compute capability. Using optimisations for Kepler-architectures, will not always have the expected effect on previous generation GPUs.

compute_capability

The funny thing is that if you need CUDA-code to be working on various devices, you’d better off hiring an OpenCL-developer, as they have more experience with this subject.

NVIDIA doesn’t want you to know that you’ll need experts to get CUDA running on multiple devices with high performance.

4. NVIDIA field engineers have had special sales training

If you’re in the back of the room, or turn around, then you notice.

How many people have had experience with CUDA?
15 to 20 of the 100 people raise their hand.
Ah, I see the vast majority. Very nice!

I’ve seen this each time I visited a presentation by NVIDIA and it bugs me a lot. NVIDIA presenters are seemingly allowed to lie on statistics openly. Meaning that several told facts from semi-public events is simply false. You can best check yourself by visiting one of their presentations or go to a booth. My favourite is: “Do you use CUDA or are you locked-in to OpenCL?”.

Nvidia-ethics
Graphics ethics are not sound ethics?

From their website:

Ethics

We believe that the integrity with which we conduct ourselves and our business is key to our ability to running a successful, innovative business and maintaining our reputation. We expect our directors, executives and employees to conduct themselves with the highest degree of integrity, ethics and honesty.
– See more at: http://www.nvidia.com/object/fy14-gcr-governance-ethics.html#sthash.FIvNaXgZ.dpuf

Ethics

We believe that the integrity with which we conduct ourselves and our business is key to our ability to running a successful, innovative business and maintaining our reputation. We expect our directors, executives and employees to conduct themselves with the highest degree of integrity, ethics and honesty.

– See more at: http://www.nvidia.com/object/fy14-gcr-governance-ethics.html#sthash.FIvNaXgZ.dpuf

For NVidia “ethical” sure doesn’t include telling the half truths as you’ll find on this page.

NVIDIA doesn’t want you to know that they’re exaggerating “facts” more than other companies do.

5. OpenCL-support is secretly still there

A few years ago NVIDIA removed OpenCL-tools from CUDA 5. There were a lot of reasons told why it was removed, even “download size”. While not actively promoted, you can still file bugs (which they repair, most of the times) and even get help on OpenCL, if you are a big customer. There are rumours that the biggest customers even have access to tools like a profiler and a debugger, but I could not verify that.

NVIDIA doesn’t want you to know the real reasons why they removed OpenCL-support from their tools.

6. 3D Stacked Memory is not invented by NVIDIA, but by AMD&Hynix

AMD is the big brain behind DDR and GDDR. When you buy a Tesla-card, you also pay for AMD’s memory. This is also a reason why AMD-cards have quite high memory bandwidths: they’ve got the experience. Together with Hynix AMD also has developed 3D stacked memory. In other words: the big thing of Volta is an invention by AMD. With this in mind, watch the below video.

It’s quite funny that NVIDIA CEO Jen Hsun Huang is bragging about AMD’s inventions, as if it’s was their own.

NVIDIA doesn’t want you to know they did not invent stacked memory, and just wants to wow you by any means necessary.

7. Unified Memory is Virtualunified-mem

With much sound and show the whole world had to know: they also have what AMD, Intel and ARM have: memory shared between CPU and GPU. One detail was that it is virtual memory, except for Tegra. End of last year I’ve completely debunked their “Unified Memory”, which was actually “Unified Virtual Memory”. Luckily they’re now more clear about it, if you ignore all the articles that keep rewriting history.

NVIDIA doesn’t want you to know that they are lagging on the competition (AMD and Intel) who have actual Unified Memory and over 1 TFLOPS of single precision performance.

What am I thinking about NVIDIA personally?

I think NVIDIA is company that creates great products, but has a marketing-department that harms them in the long term. I think they crossed the line between marketing and lying, with two feet. I think the company’s ethics are simply too low.

My personal goal is to get them back to OpenCL, so developers can then focus on one language for all accelerators.

I want you to know, that you need to double check everything what vendors claim. Via this blog I’ll share my insights to help you better understand this market, the technologies behind and all the potential of accelerators like GPUs, FPGAs and DSPs.

Market Positioning of Graphics and Compute solutions

positioningWhen compute became possible on GPUs, it was first presented as an extra feature and did not change much to the positioning of the products by AMD/ATI and Nvidia. NVidia started with positioning server-compute (described as “the GPU without a monitor-connector”), where AMD and Intel followed. When the expensive Geforce GTX Titan and Titan Z got introduced it became clear that NVidia still thinks about positioning: Titan is the bridge between Geforce and Tesla, a Tesla with video-out.

Why is positioning important? It is the difference between “I’d like to buy a compute-card for my desktop, so I can develop algorithms that run as well on the compute-server” and “I’d like to buy a graphics card for doing computations and later run that on a passively cooled graphics card”. The second version might get a “you don’t want to do that”, as graphics terminology is used to refer to compute-goals.

Let’s get to the overview.

AMDNVIDIAIntelARM
Desktop User *A-series APUIris / Iris Pro
Laptop User *A-series APUIris / Iris Pro
Mobile UserTegraIrisMali T720 / T4xx
Desktop GamerRadeonGeForce
Laptop GamerRadeon MGeForce M
Mobile High-endTegra K (?)Iris ProMali T760 / T6xx
Desktop GraphicsFirePro WQuadro
Laptop GraphicsFirePro MQuadro M
Desktop (DP) ComputeFirePro WTitan (hdmi) / Tesla (no video-out)XeonPhi
Laptop (DP) ComputeFirePro MQuadro MXeonPhi
Server (DP) ComputeFirePro STeslaXeonPhi (active cooling!)
CloudSkyGrid

* = For people who say “I think my computer doesn’t have a GPU”.

My thoughts are that Titan are to promote compute at the desktop, while also Tesla is promoted for that. AMD has the FirePro W for that, for both Graphics professionals and Compute professionals, to serve all customers. Intel uses XeonPhi for anything compute and it’s is all actively cooled.

The table has some empty spots: Nvidia doesn’t have IGP, AMD doesn’t have mobile graphics and Intel doesn’t have a clear message at all (J, N, X, P, K mixed for all types of markets). Mobile GPUs from ARM, Imagination, Qualcomm and others have a clear message to differentiate between high-end and low-end mobile GPUs, whereas NVidia and Intel don’t.

Positioning of the Titan Z

Even though I think that Nvidia made a right move with positioning a GPU for the serious Compute Hobbyist, they are very unclear with their proposition. AMD is very clear: “Want professional graphics and compute (and play games after work)? Get FirePro W for workstations”, whereas Nvidia says “Want compute? Get a Titan if you want video-output, or Tesla if you don’t”.

See this Geforce-page, where they position it as a gamers-card that competes with the Google Brain Supercomputer and a MAC Pro. In other places (especially benchmarks) it is stressed that it is not meant for gamers, but for compute enthusiasts (who can afford it). See for example this review on Hardware.info:

That said, we wouldn’t recommend this product to gamers anyway: two Nvidia GeForce GTX 780 Ti or AMD Radeon R9 290X cards offer roughly similar performance for only a fraction of the money. Only two Titan-Zs in SLI offer significantly higher performance, but the required investment is incredibly high, to the point where we wouldn’t even consider these cards for our Ultimate PC Advice.

As a result, Nvidia stresses that these cards are primarily intended for GPGPU applications in workstations. However, when looking at these benchmarks, we again fail to see a convincing image that justifies the price of these cards.

So NVIDIA’s naming convention is unclear. If TITAN is for the serious and professional compute developer, why use the brand “Geforce”? A Quadro Titan would have made much more sense. Or even “Tesla Workstation”, so developers could get a guarantee that the code would run on the server too.

Differentiating from low-end compute

Radeon and Geforce GPUs are used for low-cost compute-cluster. Both AMD and NVidia prefer to sell their professional cards for that market and have difficulties to make a clear understanding that game-cards are not designed for compute-only solutions. The one thing they did the past years is to reserve good double precision computations for their professional cards only. An existing difference was the driver quality between Quadro/FirePro (industry quality) and GeForce/Radeon. I think both companies have to rethink the differentiated driver-strategy, as compute has changed the demands in the market.

I expect more differences between the support-software for different types of users. When would I pay for professional cards?

  1. Double Precision GFLOPS
  2. Hardware differences (ECC, NVIDIA GPUDirect or AMD SDI-link/DirectGMA, faster buses, etc)
  3. Faster support
  4. (Free) Developer Tools
  5. System Configuration Software (click-click and compute works)
  6. Ease of porting algorithms to servers/clusters (up-scaling with less bugs)
  7. Ease of porting algorithms to game-cards (simulation-mode for several game-cards)

So the list starts with hardware specific demands, then focuses to developer support. Let me know in the comments, why you would (not) pay for professional cards.

Evolving from gamer-compute to server-compute

GPU-developers are not born, but made (trained or self-educated). Most times they start with OpenCL (or CUDA) on their own PC or laptop.

With Nvidia it would be hobby-compute on Geforce, then serious stuff on Titan, then Tesla or Grid. AMD has a comparable growth-path: hobby-compute on Radeon, then upgrade to FirePro W and then to FirePro S or Sky. Intel it is Iris or XeonPhi directly, as their positioning is not clear at all if it comes to accelerators.

Conclusion

Positioning of the graphics cards and compute cards are finally getting finalised at the high-level, but will certainly change a few more times in the year(s) to come. Think of the growing market for home-video editors in 2015, who will probably need a compute-card for video-compression. Nvidia will come with another solution than AMD or Intel, as it has no desktop-CPU.

Do you think it will be possible to have an AMD APU with NVIDIA accelerator? Do people need to buy a accelerator-box in 2015 that can be attached to their laptop or tablet via network or USB, to do the rendering and other compute-intensive work (a “private compute cloud”)? Or will there always be a market for discrete GPUs? Time will tell.

Thanks for reading. I hope the table makes clear how things are now as of 2014. Suggestions are welcome.