The Quest for More Processing Power, Part Two: "Multi-core and multi-threaded gaming"
by Johan De Gelas on March 14, 2005 12:05 AM EST- Posted in
- CPUs
Unreal 3
[2] The new Unreal 3 engine is a state of the art game development framework for next-generation consoles and DirectX9 PC's, but what sparked our interest for this article was the fact that it is probably one of the first multithreaded game engines for the most popular game genre: first person shooters.AnandTech: The new Unreal Engine 3 is designed for multi-threading, and will make good use of dual core CPUs available when games on the new engine come out. What parts of the game will benefit/be improved, thanks to multiprocessing? What will be the parts that will benefit the most?
Tim Sweeney: For multithreading optimizations, we're focusing on physics, animation updates, the renderer's scene traversal loop, sound updates, and content streaming.We are not attempting to multithread systems that are highly sequential and object-oriented, such as the gameplay.
Implementing a multithreaded system requires two to three times the development and testing effort of implementing a comparable non-multithreaded system, so it's vital that developers focus on self-contained systems that offer the highest effort-to-reward ratio.
AnandTech: What kind of performance improvement (rough estimate) do you expect from a dual core CPU compared to a single core CPU with the same core? (A few percents, a bit more than 10%, tens of percents?) In other words, will a gamer "feel" the difference between a dual core and single core or between a single and dual CPU system running an Unreal 3 engine based game?
Tim Sweeney: It's too early to talk numbers, but we certainly expect Unreal Engine 3 titles to see significant gains on multi-core platforms.
AnandTech: In the past years, games have typically depended more on GPU power than on CPU power (a mid-range CPU with a high end video card was/is faster than a high end CPU with a mid-range video card even at relatively low resolutions). Is the multithreaded nature of the Unreal 3 engine a sign that CPU performance is playing again a more important role in the gaming experience?
Tim Sweeney: Unreal Engine games have always been more CPU-intensive than the norm, for two reasons. First, we're always trying to push the leading edge with physics and other CPU-based features. Second, the Unreal Engine has a much more extensive gameplay scripting interface aimed at empowering mod authors and improving developer productivity by enabling safer and higher-level gameplay development. So we're not going to have any trouble keeping up with increases in CPU power.
Multi-core will be especially valuable because CPU performance scaling due to frequency improvements has tapered off over the past few years.
Clock speed has increased slowly, and real performance hasn't increased in proportion to clocks. But two cores have approximately twice the real aggregate performance as one core, so we're about to see a nonlinear improvement.
Finally, keep in mind that the Windows XP driver model for Direct3D is quite inefficient, to such an extent that in many applications, the OS and driver overhead associated with issuing Direct3D calls approaches 50% of available CPU cycles.Hiding this overhead will be one of the major immediate uses of multi-core.
AnandTech: Did you make use of auto-parallelisation compiler technology (like the auto parallelisation found in Intel C++ compiler) to make the engine multithreaded?
Tim Sweeney: Auto-parallelization of C++ code is not a serious notion. This falls in the same category as the Intel compiler's strip-mining optimizations and other such tricks, which are designed to speed up one particular loop in one particular SpecFP benchmark. These techniques applied to C/C++ programs are completely infeasible on the scale of real applications.
AnandTech: What about OpenMP?
Tim Sweeney: There are two parts to implementing multithreading in an application. The first part is launching the threads and handing data to them; the second part is making the appropriate portions of your 500,000-line codebase thread-safe. OpenMP solves only the first problem. But that's the easy part - any idiot can launch lots of threads and hand data to them. Writing thread-safe code is the far harder engineering problem and OpenMP doesn't help with that.
AnandTech: Programming multiple threads can be complex. Wasn't it very hard to deal with the typical problems of programming multithreaded such as deadlocks, racing and synchronization?
Tim Sweeney: Yes! These are hard problems, certainly not the kind of problems every game industry programmer is going to want to tackle. This is also why it's especially important to focus multithreading efforts on the self-contained and performance-critical subsystems in an engine that offer the most potential performance gain. You definitely don't want to execute your 150,000 lines of object-oriented gameplay logic across multiple threads - the combinatorical complexity of all of the interactions is beyond what a team can economically manage. But if you're looking at handing off physics calculations or animation updates to threads, that becomes a more tractable problem.
We also see middleware as one of the major cost-saving directions for the industry as software complexity increases. It's certainly not economical for hundreds of teams to write their own multithreaded game engines and tool sets. But if a handful of company write the core engines and tools, and hundreds of developers can reuse that work, then developers can focus more of their time and money on content and design, the areas that really set games apart.
AnandTech: The current OpenGL and DirectX are - AFAIK - not very well adapted to multithreaded programming. How did you solve this problem? Or wasn't it a problem at all?
Tim Sweeney: There is only one GPU in there, and though it is highly parallel at the pixel level, its execution is still serial on the granularity of state changes and triangle submission. So it is natural that the interface to the GPU remain single-threaded, and that part of one CPU thread be dedicated to submitting rendering commands.
49 Comments
View All Comments
Pjotr - Monday, March 14, 2005 - link
"dual-cored GPUs are stupid. given the parallel nature of graphics, it makes more sense to just add another pipeline at very little design cost."Unless you hit a power and/or heat output wall.
Tell nVidia that parallell GPUs are bad, they alreay sell their SLI solution for dual-GPU computers.
silverwolf - Monday, March 14, 2005 - link
PPU, is the way to go.defter - Monday, March 14, 2005 - link
"Given the immense complexity involved, I expect dual cores taking a VERY VERY long time to catch on... even then it'll be a half assed job."Well ALL future consoles will use multi core CPUs. Thus if developers want to sell games, their games must take advantage of at least two cores :)
ceefka - Monday, March 14, 2005 - link
#12 That's how it looks, for now. "No one will ever need more than 8 cores." :-DMy dumb question, I was reading:
"Tim clearly emphasizes that only parts of the application can be economically parallelized. Increasing parallelisation, using more threads, is simply not feasible. There is a pretty hard economic limit to TLP."
Isn't a high IPC-count also a form of parallelism? If so, then beyond a certain count won't it be just as hard to take advantage of a high IPC-count.
sandorski - Monday, March 14, 2005 - link
Good article. Most of it was over my head, but the gist was most important. That being that Multicore is a big question mark for Gamers and other common users.I've always preferred Sweeney over others in the industry, he knows what he's doing without getting in everybody elses face about it. I also found it appropriate that he was interviewed on the subject since Unreal Engines have always internally manged 100's of Processes in order to work(I assume other Engines do the same, but my knowledge of hte Unreal Engines is more thorough than them).If anyone can figure out how to use Multicore in gaming my money's on Sweeney.
Calin - Monday, March 14, 2005 - link
Auto-parallelization is of limited use, and it can work only on small pieces of code. You might get a couple of percent extra speed, but no more (or no much more). Managing multi-threaded code interdependent code would be a nightmare.However, some "extra" speed can be recovered in case of multi processors (or multi core) from the reduced state (thread) change. Certainly not extra 24%, but more than a bit.
FDSatyr - Monday, March 14, 2005 - link
Good read - mainly for Tim's comments though! I really enjoy the way Tim isn't arrogant at all in the way he talks. Some fairly silly questions from A though! I still think threads are rubbish, that processes and better schedulers are the way forward. I think the next step - realistically impossible in the industry it may be - would be to create a fresh architecture, and put an x86/87 core on the same die. Ho-hum.mkruer - Monday, March 14, 2005 - link
My magic eightball says that after 4-8 cores, any other core that will be added will be near worthless.AnandThenMan - Monday, March 14, 2005 - link
Anand got owned by Tim Sweeney.AnandTech: Did you make use of auto-parallelisation compiler technology...
Tim Sweeney: Auto-parallelization of C++ code is not a serious notion.
Good article though.
overclockingoodness - Monday, March 14, 2005 - link
Oh my God - the Unreal 3 engine is beautiful. I get amazed everytime I look at it. I can't wait for games to be featured on Unreal 3 and the like engines in the future. So fine...