The GPU Advances: ATI's Stream Processing & Folding@Home
by Ryan Smith on September 30, 2006 8:00 PM EST- Posted in
- GPUs
In the continual progression of GPU technology, we've seen GPUs become increasingly useful at generalized tasks as they have added flexibility for game designers to implement more customized and more expansive graphical effects. What started out as a simple fixed-function rendering process, where texture and vertex data were fed into a GPU and pixels were pushed out, has evolved into a system where a great deal of processing takes place inside the GPU. The modern GPU can be used to store and manipulate data in ways that go far beyond just quickly figuring out what happens when multiple textures are mixed together.
What GPUs have evolved into today are devices that are increasingly similar to CPUs in their ability to do more things, while still specializing in only a subset of abilities. Starting with Shader Model 2.0 on cards like the Radeon 9700 and continuing with Shader Model 3.0 and today's latest cards, GPUs have become floating-point powerhouses that are able to do most floating-point calculations many times faster than a CPU, a necessity as 3D rendering is a very FP-intensive process. At the same time, we have seen GPUs add programming constructs like looping, branching, and other abilities previously only used on CPUs, but which are crucial to enable effective programmer use of the GPU resources . In short, today's GPUs have in many ways become extremely powerful floating-point processors that have been used for 3D rendering but little else.
Both ATI and NVIDIA have been looking to put the expanded capabilities of their GPUs to good use, with varying success. So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far.
Meanwhile the academic world has been working on designing and utilizing custom-built floating-point hardware for years for their own research purposes. The class of hardware related to today's topic, stream processors, are extremely powerful floating-point processors able to process whole blocks of data at once, where CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!+, but these efforts still pale in comparison to what custom hardware has been able to do. This same progress was happening on GPUs, only in a different direction, and until recently GPUs remained untapped as anything other than a graphics tool.
Today's GPUs have evolved into their own class of stream processors, sharing much in common with the customized hardware of researchers, as a result of the 3D rendering process also being a streaming task. The key difference here however is that while GPU designers have cut a couple of corners where they don't need certain functionality for 3D rendering as compared to what a custom processor can do, by and large they have developed extremely fast stream processors that are just as fast as custom hardware but due to economies of scale are many, many times cheaper than a custom design.
It's here where ATI is looking for new ideas on what to run on their GPUs as part of their new stream computing initiative. The academic world is full of such ideas, chomping at the bit to run their experiments on more than a handful of customized hardware designs. One such application, and part of the star of today's announcement, is Folding@Home, a Stanford research project designed to simulate protein folding in order to unlock the secrets of diseases caused by flawed protein folding.
What GPUs have evolved into today are devices that are increasingly similar to CPUs in their ability to do more things, while still specializing in only a subset of abilities. Starting with Shader Model 2.0 on cards like the Radeon 9700 and continuing with Shader Model 3.0 and today's latest cards, GPUs have become floating-point powerhouses that are able to do most floating-point calculations many times faster than a CPU, a necessity as 3D rendering is a very FP-intensive process. At the same time, we have seen GPUs add programming constructs like looping, branching, and other abilities previously only used on CPUs, but which are crucial to enable effective programmer use of the GPU resources . In short, today's GPUs have in many ways become extremely powerful floating-point processors that have been used for 3D rendering but little else.
Both ATI and NVIDIA have been looking to put the expanded capabilities of their GPUs to good use, with varying success. So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far.
Meanwhile the academic world has been working on designing and utilizing custom-built floating-point hardware for years for their own research purposes. The class of hardware related to today's topic, stream processors, are extremely powerful floating-point processors able to process whole blocks of data at once, where CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!+, but these efforts still pale in comparison to what custom hardware has been able to do. This same progress was happening on GPUs, only in a different direction, and until recently GPUs remained untapped as anything other than a graphics tool.
Today's GPUs have evolved into their own class of stream processors, sharing much in common with the customized hardware of researchers, as a result of the 3D rendering process also being a streaming task. The key difference here however is that while GPU designers have cut a couple of corners where they don't need certain functionality for 3D rendering as compared to what a custom processor can do, by and large they have developed extremely fast stream processors that are just as fast as custom hardware but due to economies of scale are many, many times cheaper than a custom design.
It's here where ATI is looking for new ideas on what to run on their GPUs as part of their new stream computing initiative. The academic world is full of such ideas, chomping at the bit to run their experiments on more than a handful of customized hardware designs. One such application, and part of the star of today's announcement, is Folding@Home, a Stanford research project designed to simulate protein folding in order to unlock the secrets of diseases caused by flawed protein folding.
43 Comments
View All Comments
BikeAR - Monday, October 2, 2006 - link
I may be living in a vacuum these days, but did anyone notice the following comment on theF@D site, "Help" page?...
Intel has been helping support our project(Stanford/Intel Alzheimer's Research Program), but has announced that it is ending their contribution to distributed computing in general and no longer supports any distributed computing clients, including F@H.
What is up with this?
Staples - Monday, October 2, 2006 - link
I'd love to know how a fully loaded X1900 folding would be compared to an E6600 Core 2 Duo? These GPUs have always been said to be 100s of times faster than CPUs at what they are designed to do so I'd love to see if it is really true or not. If not, it looks like we may have been lied to for so so many years.Ryan Smith - Monday, October 2, 2006 - link
Unfortunately it looks like they'll be using larger units. We thought we'd be able to use the same units for both the normal and GPU-accelerated clients, but this appears to not be the case. There's no direct way to compare the clients then, the closest we could get is comparing the number of points given for a work unit.peternelson - Sunday, October 1, 2006 - link
"So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far."Erm, not so! Try looking at GPGPU.org
Also see the books GPU Gems and GPU Gems 2.
mostlyprudent - Sunday, October 1, 2006 - link
BTW, a type-o in the 1st paragraph of the article: "...manipulate data in ways that goes far beyond" -- should read "in ways that GO far beyond..".I knew about SETI, but was completely unaware of F@H. Thanks. I will look to get involved!
imaheadcase - Sunday, October 1, 2006 - link
Did anandtech just post a article on the weekend when most PC users are at home so they can read it? Amazing! :PCupCak3 - Sunday, October 1, 2006 - link
When loading the client, our team number is 198. :)If anyone has any questions and/or would like to join the TeAm, come and visit us http://forums.anandtech.com/categories.aspx?catid=...">here. We have many people which would be more than happy to answer any question.
I'll try to answer some of the questions and comments which have been posted thus far:
NVIDIA support may come later. This is a BETA right now so only a small number of devices will be supported. The supported ATI line will expand and when NVIDIA gets the kinks worked out on using GPGPU processing for their cards, I"m sure the Pande group would be glad to them up :)
No one knows about Crossfire support yet. We'll know more about this tomorrow.
Same goes for using CPU + GPU.
Scalability: A multithreaded client I've heard is in the works but I'm sure its on hiatus with the GPU client coming into BETA.
That is correct in saying for each core, a separate folding@home client must be loaded. Right now the max amount of RAM one core will use is between 100 and 120 megs of RAM. Other workunits utilize around 5 or 10 megs. The client will only load workunits with this amount of RAM usage if you have the resources to spare. (so not just having 1 256 meg stick in your XP box and then using 110 megs of that for folding) We still yet do not know the RAM requirements for the vid cards.
The Pande Group has posted that 1600 and 1800 series cards will be the next ones supported :) (if all goes well of course)
Linux and Max OSX client is also available for those wondering.
I hope this helps!
Messire - Sunday, October 1, 2006 - link
Hi folksIt is surprising to me that this Stanford project works only on the most modern and powerful ATI GPUs, because i know of an another Stanford project called BrookGPU. Here is the URL: http://graphics.stanford.edu/projects/brookgpu/">http://graphics.stanford.edu/projects/brookgpu/
It is only in beta now and seems to be abandoned a little bit, but they've made a very usable GENERAL PURPOSE streaming programming language wich works on much more older GPUs as well. And it is very fast...
Messire
lopri - Sunday, October 1, 2006 - link
I know this question is somewhat off the discussion at the moment, but I can't help but ask. Would this crank up the GPU usage to 100% as it does to CPU? All the time? Then it could be a problem for average users, because the X1900 is hot as it is even in idle.GhandiInstinct - Sunday, October 1, 2006 - link
Don't get me wrong, I'd love to contribute, as I have with SETI, but what evidence is there that this will help anything or even at all?I mean we've had super computers working in science for a while now and I haven't heard of any major breakthroughs because of it, and if my computer is going to exhaust a little extra heat I want some numbers to crunch before I do so.
That's all.