NVIDIA's Bumpy Ride: A Q4 2009 Update
by Anand Lal Shimpi on October 14, 2009 12:00 AM EST- Posted in
- GPUs
Final Words
Is NVIDIA in trouble? In the short term there are clearly causes to worry. AMD’s Eric Demers often tells me that the best way to lose a fight is by not showing up. NVIDIA effectively didn’t show up to the first DX11 battles, that’s going to hurt. But as I said in the things get better next year section, they do get better next year.
Fermi devotes a significant portion of its die to features that are designed for a market that currently isn’t generating much revenue. That needs to change in order for this strategy to make sense.
NVIDIA told me that we should see exponential growth in Tesla revenues after Fermi, but what does that mean? I don’t suspect that the sort of customers buying Tesla boards and servers will be lining up on day 1. I’d say best case scenario, Tesla revenues should see a bump one to two quarters after Fermi’s launch.
Nexus, ECC, and better double precision performance will all make Fermi more attractive in the HPC space than Cypress. The question is how much revenue will that generate in the short term.
Nexus enables full NVIDIA GPU debugging from within Visual Studio. Not so useful for PC gaming, but very helpful for Tesla
Then there’s the mobile space. NVIDIA could do very well with Tegra. NVIDIA is an ARM licensee, and that takes care of the missing CPU piece of the puzzle. Unlike the PC space, x86 isn’t the dominant player in the mobile market. NVIDIA has a headstart in the ultra mobile space much like it does in the GPU computing space. Intel is a bit behind with its Atom strategy. NVIDIA could use this to its advantage.
The transition needs to be a smooth one. The bulk of NVIDIA’s revenues today come from PC graphics cards. There’s room for NVIDIA in the HPC and ultra mobile spaces, but it’s not revenue that’s going to accumulate over night. The changes in focus we’re seeing from NVIDIA today are in line with what it’d have to do in order to establish successful businesses outside of the PC industry.
And don’t think the PC GPU battle is over yet either. It took years for NVIDIA to be pushed out of the chipset space, even after AMD bought ATI. Even if the future of PC graphics are Intel and AMD GPUs, it’s going to take a very long time to get there.
106 Comments
View All Comments
AnandThenMan - Wednesday, October 14, 2009 - link
Leave it to Scali to regurgitate the same old same old.TGressus - Wednesday, October 14, 2009 - link
It's always the same, man. When ATI/AMD is down people get interested in their comeback story too.I've always wondered why people bother to "take a side". How'd that work out with Blu-Ray? Purchased many BD-R DL recently?
Personally, I'd like to see more CPU and GPU companies. Not less.
Scali - Thursday, October 15, 2009 - link
What comeback story?My point was that it wouldn't be the first time that the bigger, more expensive GPU was the best bang for the buck.
It isn't about taking sides or comebacks at all.
I'm interested in Fermi because I'm a technology enthusiast and developer. It sounds like an incredible architecture. It has nothing to do with the fact that it happens to have the 'nVidia' brand attached to it. If it was AMD that came up with this architecture, I'd be equally interested.
But let's just view it from a neutral, technical point of view. AMD didn't do all that much to its architecture this time, apart from extending it to support the full DX11 featureset. It will not do C++, it doesn't have a new cache hierarchy approach, it won't be able to run multiple kernels concurrently, etc etc. There just isn't as much to be excited about.
Intel however... now their Larrabee is also really cool. I'm excited to see what that is going to lead to. I just like companies that go off the beaten path and try new approaches, take risks. That's why I'm an enthusiast. I like new technology.
At the end of the day, if both Fermi and Larrabee fail, I'll just buy a Radeon. Boring, but safe.
Scali - Wednesday, October 14, 2009 - link
"Fermi devotes a significant portion of its die to features that are designed for a market that currently isn’t generating much revenue."The word 'devotes' is in sharp contrast with what Fermi aims to achieve: a more generic programmable processor.
In a generic processor, you don't really 'devote' anything to anything, your execution resources are just flexible and can be used for many tasks.
Even today's designs from nVidia do the same. The execution units can be used for standard D3D/OpenGL rendering, but they can also be used for PhysX (gaming market), video encoding (different market), Folding@Home (different market again), PhotoShop (another different market), HPC (yet another market), to name but a few things.
So 'devoted', and 'designed for a market'? Hardly.
Sure, the gaming market may generate the most revenue, but nVidia is starting to tap into all these other markets now. It's just added revenue, as long as the gaming performance doesn't suffer. And I don't see any reason for Fermi's gaming performance to suffer. I think nVidia's next generation is going to outperform AMD's offerings by a margin.
wumpus - Thursday, October 15, 2009 - link
Go back and read the white paper. Nvidia plans to produce a chip that computes roughly half the double floating point multiplies as it can produce single point. This means that they have doubled the amount of transistors in the multipliers so that they can keep up with the rest of the chip in double mode (1 double or two singles both produce 8 bytes that need to be routed around the chip).There is no way to deny that this takes more transistors. Simply put if each letter represents 16 bits two singles represent:
(a0)(a1)*(b0)(b1)=16*(a0b0)+8*(a0b1)+8*(a1b0)+(a1)(b1)
(c0)(c1)*(d0)(d1)=16*(c0d0)+8*(c0d1)+8*(c1d0)+(c1d1)
But if you have to multiply one double you get
(a0)(a1)(a2)(a3)*(b0)(b1)(b2)(b3)=
4096*(a0b0)(a0b1)(a0b2)(a0b3)
+256*(a1b0)(a1b1)(a1b2)(a1b3)
+16*(a2b0)(a2b1)(a2b2)(a2b3)
+(a3b0)(a3b1)(a3b2)(a3b3)
Which works to twice the work. Of course, the entire chip isn't multipliers, but they make up a huge chunk. Somehow I don't think either ATI nor nvidia are going to say exactly what percentage of the chip is made up by multipliers. I do expect that it is steadily going down and if such arrays keep being made, they will all eventually use double precision (and possibly full ieee754 with all the rounding that entails).
Scali - Saturday, October 17, 2009 - link
My point is that the transistors aren't 'dedicated' to DP.They just make each single unit capable of both SP and DP. So the same logic that is used for DP is also re-used for SP, and as such the unit isn't dedicated. It's multi-functional.
Besides, they probably didn't just double up the transistorcount to get from SP to DP.
I think it's more likely that they'll use a scheme like Intel's SSE units. In Intel's case you can either process 4 packed SP floats in parallel, or 2 packed DP floats, with the same unit. This would also make it more logical why the difference in speed is a factor 2.
Namely, if you take the x87 unit, it can always process only one number at a time, but SP isn't twice as fast as DP. Since you always use a full DP unit, SP only benefits from early-out, which doesn't gain that much on most operations (eg add/sub/mul).
So I don't think that Fermi is just a bunch of full DP ALUs which will run with 'half the transistors' when doing SP math. Rather, I think they will just 'split' the DP units in some clever way that they can process two SP numbers at a time (or fuse two SP units to process one DP number, however you look at it). This only requires you to double up a relatively small part of the logic, you split up your internal registers.
Zool - Wednesday, October 14, 2009 - link
Maybe but you forget one thing. Ati could pull out without problem a 5890 (with faster clocks and maybe 384bit memory) in Q1 2010 or a whole new chip somewhere in Q2 2010.So it doesnt change the fact that they are late. In this position it will be hard for nvidia if ati can make always the first move.
Scali - Wednesday, October 14, 2009 - link
A 5890 doesn't necessarily have to be faster than Fermi. AMD's current architecture isn't THAT strong. It's the fastest GPU on the market, then again, it's the only high-end GPU that leverages 40 nm and GDDR5. So it's not all that surprising.Fermi will not only leverage 40 nm and GDDR5, but also aim at a scale above AMD's architecture.
AMD may make the first move, but it doesn't have to be the better move.
Assuming Fermi performance is in order, I very much believe that nVidia made the right move. Where AMD just patched up their DX10.1 architecture to support DX11 features, nVidia goes way beyond DX11 with an entirely new architecture.
The only thing that could go wrong with Fermi is that it doesn't perform well enough, but it's too early to say anything about that now. Other than that, Fermi will mark a considerable technological lead of nVidia over AMD.
tamalero - Sunday, October 18, 2009 - link
and you know this.... based on what facts?the "can of whoopass" from nvidia's marketting?
AnandThenMan - Wednesday, October 14, 2009 - link
"The only thing that could go wrong with Fermi is that it doesn't perform well enough"Really? You really believe that? So if it has a monstrous power draw, extremely expensive, 6 months late, (even longer for scaled down parts) low yields etc. that's a-okay? Not to mention a new architecture always has software challenges to make the most of it.