The Quest for More Processing Power, Part Two: "Multi-core and multi-threaded gaming"
by Johan De Gelas on March 14, 2005 12:05 AM EST- Posted in
- CPUs
Unreal 3
[2] The new Unreal 3 engine is a state of the art game development framework for next-generation consoles and DirectX9 PC's, but what sparked our interest for this article was the fact that it is probably one of the first multithreaded game engines for the most popular game genre: first person shooters.AnandTech: The new Unreal Engine 3 is designed for multi-threading, and will make good use of dual core CPUs available when games on the new engine come out. What parts of the game will benefit/be improved, thanks to multiprocessing? What will be the parts that will benefit the most?
Tim Sweeney: For multithreading optimizations, we're focusing on physics, animation updates, the renderer's scene traversal loop, sound updates, and content streaming.We are not attempting to multithread systems that are highly sequential and object-oriented, such as the gameplay.
Implementing a multithreaded system requires two to three times the development and testing effort of implementing a comparable non-multithreaded system, so it's vital that developers focus on self-contained systems that offer the highest effort-to-reward ratio.
AnandTech: What kind of performance improvement (rough estimate) do you expect from a dual core CPU compared to a single core CPU with the same core? (A few percents, a bit more than 10%, tens of percents?) In other words, will a gamer "feel" the difference between a dual core and single core or between a single and dual CPU system running an Unreal 3 engine based game?
Tim Sweeney: It's too early to talk numbers, but we certainly expect Unreal Engine 3 titles to see significant gains on multi-core platforms.
AnandTech: In the past years, games have typically depended more on GPU power than on CPU power (a mid-range CPU with a high end video card was/is faster than a high end CPU with a mid-range video card even at relatively low resolutions). Is the multithreaded nature of the Unreal 3 engine a sign that CPU performance is playing again a more important role in the gaming experience?
Tim Sweeney: Unreal Engine games have always been more CPU-intensive than the norm, for two reasons. First, we're always trying to push the leading edge with physics and other CPU-based features. Second, the Unreal Engine has a much more extensive gameplay scripting interface aimed at empowering mod authors and improving developer productivity by enabling safer and higher-level gameplay development. So we're not going to have any trouble keeping up with increases in CPU power.
Multi-core will be especially valuable because CPU performance scaling due to frequency improvements has tapered off over the past few years.
Clock speed has increased slowly, and real performance hasn't increased in proportion to clocks. But two cores have approximately twice the real aggregate performance as one core, so we're about to see a nonlinear improvement.
Finally, keep in mind that the Windows XP driver model for Direct3D is quite inefficient, to such an extent that in many applications, the OS and driver overhead associated with issuing Direct3D calls approaches 50% of available CPU cycles.Hiding this overhead will be one of the major immediate uses of multi-core.
AnandTech: Did you make use of auto-parallelisation compiler technology (like the auto parallelisation found in Intel C++ compiler) to make the engine multithreaded?
Tim Sweeney: Auto-parallelization of C++ code is not a serious notion. This falls in the same category as the Intel compiler's strip-mining optimizations and other such tricks, which are designed to speed up one particular loop in one particular SpecFP benchmark. These techniques applied to C/C++ programs are completely infeasible on the scale of real applications.
AnandTech: What about OpenMP?
Tim Sweeney: There are two parts to implementing multithreading in an application. The first part is launching the threads and handing data to them; the second part is making the appropriate portions of your 500,000-line codebase thread-safe. OpenMP solves only the first problem. But that's the easy part - any idiot can launch lots of threads and hand data to them. Writing thread-safe code is the far harder engineering problem and OpenMP doesn't help with that.
AnandTech: Programming multiple threads can be complex. Wasn't it very hard to deal with the typical problems of programming multithreaded such as deadlocks, racing and synchronization?
Tim Sweeney: Yes! These are hard problems, certainly not the kind of problems every game industry programmer is going to want to tackle. This is also why it's especially important to focus multithreading efforts on the self-contained and performance-critical subsystems in an engine that offer the most potential performance gain. You definitely don't want to execute your 150,000 lines of object-oriented gameplay logic across multiple threads - the combinatorical complexity of all of the interactions is beyond what a team can economically manage. But if you're looking at handing off physics calculations or animation updates to threads, that becomes a more tractable problem.
We also see middleware as one of the major cost-saving directions for the industry as software complexity increases. It's certainly not economical for hundreds of teams to write their own multithreaded game engines and tool sets. But if a handful of company write the core engines and tools, and hundreds of developers can reuse that work, then developers can focus more of their time and money on content and design, the areas that really set games apart.
AnandTech: The current OpenGL and DirectX are - AFAIK - not very well adapted to multithreaded programming. How did you solve this problem? Or wasn't it a problem at all?
Tim Sweeney: There is only one GPU in there, and though it is highly parallel at the pixel level, its execution is still serial on the granularity of state changes and triangle submission. So it is natural that the interface to the GPU remain single-threaded, and that part of one CPU thread be dedicated to submitting rendering commands.
49 Comments
View All Comments
hzmonte - Tuesday, October 4, 2005 - link
For everyone's convenience, here is Part 3: http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...bmayer - Friday, March 25, 2005 - link
About the automatic parallization:It can be fairly easy to do. I work on some Cray X1 and X1es. A little bit about the X1. The Processors (called Multi-Streaming Processors (MSP)) are made up of 4 Single Streaming Processors (SSP). They are vector units with a lenght of 64 or 32 (can't remember, the point is they are good sized vector units). The processors are clocked at 800MHz for the X1 or 1.3GHz if you have an X1e.
Ok so what do we see? These CPUs *suck* if your code is not vectorized and running in parallel.Guess what the Cray compiler does? Automatically vectorizes, streams (takes advantage of a full MSP instead of a single SSP), and parallizes.
They lay out very clearly what the conditions are where the compiler can NOT optimize, and give you directives where you can force it to do so. You can also get a listing of why it did not do a given optimization for any given line. Actually it gives you all information by default which combined with grep is nice.
OK so there are different types of parallelizm, and the one I have just talked about is different then what they are trying to do. This has been talking about speeding up the execution of some inner loop, which is very different from doing two different things at the same time (AI module and sound module running at the same time). BUT this can still be used for great effect. When the inner loops execute for half the time as on a single core/CPU machine we now have more time to do other things, and thus see a speed improvement.
I have thought that Sony/IBM should get in touch with Cray to supply compiler tech for the Cell processor. If the Cell is as easy to write parallel code for as the X1 is we will have some very awesome games, and clusters of PS3s.
If you want to see a very nice overview of processor history and some of the crazy things people are proposing to do with the multi-cores check out ftp://ftp.cs.wisc.edu/sohi/talks/2003/pact.pdf
I agree that parallel C++ is just not happening very well. There are languages like UPC which are starting to gain hold in the HPC market, which *could* find some use in the game market. But as the state of the art stands it is Fortran which is really great for automatically generating parallel code. But who could serriously say that someone outside of engineering writes a code in Fortran?
Great article, lots to think about!
blckgrffn - Friday, March 18, 2005 - link
Loved it. All of it. Especially the interview with Sweeney - it is always nice to hear where the future *will* lie with regards to at least one major application/game. Now, just get an interview with Carmack, and I will be happy for a long time... :DNat
Caleb Jasin - Friday, March 18, 2005 - link
#41Sorry for the late reply.
And yeah pthreads are not threads really. They are processes. When you call the pthread_create() function you create a new proccess ;)
ravedave - Wednesday, March 16, 2005 - link
Sorry to double post. You could easily make a benchmark that saw 1000000000% speed increase. Take one application give it high priority and have it loop for 5 days. It would lock everything else up. Throw in a second processor and you no longer have that problem, hence a huge speedup in the other processes. I dont trust any numbers from any manufacturer.ravedave - Wednesday, March 16, 2005 - link
Excellent article. Extremely excllent. I like the fact that you mentioned GUI updates, most people forget that almost all applications are multi-thread as far as GUI/core go. I really think that Microsoft is on the right track with .NET though. I belive .NET 2 or 2.5 will really take multithreading to the next level.RockHydra11 - Tuesday, March 15, 2005 - link
My fear is that instead of creating new architctures for their processors to increase performance, they will just shove more cores on it and pass it off on people.Verdant - Tuesday, March 15, 2005 - link
#40i do see a shift to something like C# but anything that brings a "performance" hit, is likely to scare away developers, especially since on non-windows platforms the hit is pretty huge atm.
you are right that a compiler probably isn't an answer, i was merely stating that if the industry was dedicated to creating a "deserializing" compiler it would be possible, extremely complex, and probably technically more than a "compiler" but still possible...
also you are thinking of UNIX, the linux kernel has supported ever since i can remember, take a look at the pthreads and linuxthreads (glibc2) libraries
Caleb Jasin - Tuesday, March 15, 2005 - link
#35Yeah agreed, from my own experience C# threadding is much easier than threadding in C. And I would say it is the same for most code. Developing in C# is generally much faster than in C or C++. And the tests I have seen shows about a 10% performance hit between optimal C# and C++ code. So I think it is just a question of time before we see games coded primarily in C#. In the end, the time saved could be used to write more optimal code I guess, so maybe the performance hit would be negligable.
However, I don't think that we will see compilers that are smart enough to multithread code any time soon. I wrote a very simple compiler for a very simple language in university and coding compilers is extremely complicated. As Tim says, they aren't threadding gamelogics and that wouldn't make much sense either because there are too many dependencies. And even though threadding takes alot of time, there are alot of relatively easily paralizeable code in games.
Btw, there is a small error in the article. It says that Linux has thread support. It really doesn't. A thread in Linux is a process. There is no diffrence at kernel level between starting a new thread and forking a new process.
melgross - Tuesday, March 15, 2005 - link
Don't forget that the idea behind these game engines is the reusability of the code. What I mean is that they will first tackle the problems that Sweeney thought most important, and easier, and then, one by one, the harder problems will be resolved. This might take years, but performance increases are always going to be appreciated. Competing products are always going to put pressure on on each other.Ten years from now the discussion will be about how they accomplished all of this.
While dual-cored GPUs have never been used, since that is just now becoming a viable technology, dual and quad GPUs have been used for many years now on the high end boards. Not the gamer boards that we see for $500 and below.