Drilling Down: DX11 And The Multi-Threaded Game Engine
In spite of the fact that multi-threaded programming has been around for decades, mainstream programmers didn't start focusing on parallel programming until multi-core CPUs started coming along. Much general purpose code is straightforward as a single thread; extracting performance via parallel programming can be difficult and isn't always obvious. Even with talented programmers, Amdahl's Law is a bitch: your speed up from parallelization is limited by the percent of code that is necessarily sequential.
Currently, in game development, rendering is one of those "necessarily" sequential tasks. DirectX 10 isn't set up to appropriately handle multiple threads all throwing commands at the GPU. That doesn't mean parallelization of renderers can't happen, but it does limit speed up because costly synchronization techniques or management threads need to be implemented in order to make sure nothing steps out of line. All this limits the benefit of parallelization and discourages programmers from trying too hard. After all, it's a better idea to put more of your effort into areas where performance can be improved more significantly. (John Carmack put it really well once, but I can't remember the quote... and I'm doing too much benchmarking to go look for it now. :-P)
No matter what anyone does, some stuff in the renderer will need to be sequential. Programs, textures, and resources must be loaded up; geometry happens before pixel processing; draw calls intended to be executed while a certain state is active must have that state set first and not changed until completion. Even in such a massively parallel machine, order must be maintained for many things. But order doesn't always matter.
Making more things thread-safe through an extended device interface using multiple contexts and making a lot of synchronization overhead the responsibility of the API and/or graphics driver, Microsoft has enabled game developers to more easily and effortlessly thread not only their rendering code, but their game code as well. These things will also work on DX10 hardware running on a system with DX11, though some missing hardware optimizations will reduce the performance benefit. But the fundamental ability to write code differently will go a long way to getting programmers more used to and better at parallelization. Let's take a look at the tools available to accomplish this in DX11.
First up is free threaded asynchronous resource loading. That's a bit of a mouthful, but this feature gives developers the ability to upload programs, textures, state objects, and all resources in a thread-safe way and, if desired, concurrent with the rendering process. This doesn't mean that all this stuff will get pushed up in parallel with rendering, as the driver will manage what gets sent to the GPU and when based on priority, but it does mean the developer no longer has to think about synchronizing or manually prioritizing resource loading. Multiple threads can start loading whatever resources they need whenever they need them. The fact that this can also be done concurrently with rendering could improve performance for games that stream in data for massive open worlds in addition to enabling multi-threaded opportunities.
In order to enable this and other threading, the D3D device interface is now split into three separate interfaces: the Device, the Immediate Context, and the Deferred Context. Resource creation is done through the Device. The Immediate Context is the interface for setting device state, draw calls, and queries. There can only be one Device and one Immediate Context. The Deferred Context is another interface for state and draw calls, but many can exist in one program and can be used as the per-thread interface (Deferred Contexts themselves are thread unsafe though). Deferred Contexts and the free threaded resource creation through the device are where DX11 gets it multi-threaded benefit.
Multiple threads submit state and draw calls to their Deferred Context which complies a display list that is eventually executed by the Immediate Context. Games will still need a render thread, and this thread will use the Immediate Context to execute state and draw calls and to consume the display lists generated by Deferred Contexts. In this way, the ultimate destination of all state and draw calls is the Immediate Context, but fine grained synchronization is handled by the API and the display driver so that parallel threads can be better used to contribute to the rendering process. Some limitations on Deferred Contexts include the fact that they cannot query the device and they can't download or read back anything from the GPU. Deferred Contexts can, however, consume the display lists generated by other Deferred Contexts.
The end result of all this is that the future will be more parallel friendly. As two and four core CPUs become more and more popular and 8 and 16 (logical) core CPUs are on the horizon, we need all the help we can get when trying to extract performance from parallelism. This is a good move for DirectX and we hope it will help push game engines to more fully utilize more than two or even four cores when the time comes.
109 Comments
View All Comments
DerekWilson - Saturday, January 31, 2009 - link
Hi, thanks for the feedback ... I've already talked the vista issue to death elsewhere in these comments, so I'll skip that, but ...3) You are right that to get the most out of DX10 you need renderer designed for DX10 not ported from DX9. At the same time, there are things that can be done to make DX9 stuff faster by using DX10 capabilities that don't require an engine rewrite. Yes this also requires development time, but it seems developers have opted to put time into adding effects with DX10 rather than increasing framerate. I'm not disappointed with that direction, but performance was an option.
I agree with what you said about OGL.
4) I know it's not going to happen, but it didn't need to not be possible. MS could have designed the API to expose new hardware featuers without requiring the driver model change. DX9 can run on XP or Vista's new driver model for example. They chose to make it so that it was impossible to back-port rather than designing DX10 (and subsequent versions) to be tied to the driver model. OpenGL exposes most of the featuers of DX10 to WinXP and all of the things that are interesting to graphics programmers).
5) You are right -- it's not required to support multithreading but it is required if you want a performance benefit from that multithreading.
bobvodka - Saturday, January 31, 2009 - link
Well, to be fair, many of my comments were directed at others in the thread :)4) The thing is, OpenGL and DX9, specifically D3D9, live in different places. D3D9 had alot more kernel side code which caused expensive switches when issuing certain commands; this is why D3D9 got all that instance draw stuff and OpenGL didnt, because on small batch sizes OpenGL could be between 2.3x and 1.4x quicker at executing the draw call then D3D9. OpenGL sits the otherside of the kernel calls and gives the implimenters more control on when that switch occures. D3D10 also sits the other side, again not wasting that time.
There were also changes in the resource model, the driver model and various other areas; some of which were to make the Vista windowing system possible. This resource model and everything about it was very much tied to D3D10 and how it does things. OpenGL's resource model was also fundamentally different to the D3D9 model, with again the implimenter having alot more control; you never suffered a 'lost device' in OpenGL for example and the runtime automatically controlled allocated memory unlike in D3D9. There are some things OpenGL could do which D3D9 couldn't as well; such as on-card async memory copies and render-to-vertex buffer (well, ATI had a hack for it but the OpenGL method was cross-hardware).
Could they have done it? Well, of course they could have done but unlike previous DX updates it would have taken alot more effort, time and money. All for a, at the time, 5 year old OS they were hoping to begin getting rid of.
Personally, I think this was a good move all in all, the problem was with how it was presented to the general masses who suddenly saw they had to pay a few 100 USD for a new DX version.
Dribble - Saturday, January 31, 2009 - link
The reason all our games are DX9c are because that's what the consoles support (yes I know PS3 uses open GL, but it has 9c feature support - basically has 7800GTX in it).Most games are made for console as well as PC, the majority use some cross platform renderer (e.g. unreal 3 engine) and that will support DX9c. Cross platform DX support won't change until the Xbox 720 and PS4 arrive.
I don't see how DX11 will change this?
DerekWilson - Saturday, January 31, 2009 - link
That's a really good point and something I should have considered.I think the gap in performance between the PC and consoles will so heavily favor PCs that it will inspire developers to once again shift their focus to the PC. I could be wrong though.
ssj4Gogeta - Sunday, February 1, 2009 - link
That will change if Microsoft choose Larrabee for XBox 720. Cause then it will support all the DirectX (even unreleased) versions and all the OpenGL versions. If MS chooses Larrabee, and it also becomes popular for PC gaming, we PC gamers may see a huge benefit because then the games will be built for the latest DX version.bobvodka - Sunday, February 1, 2009 - link
It's all well and good saying that but Larrabee is currently utterly unproven technology.Don't get me wrong, I'd like to see it do well if only because 3 players in the GPU race will be better than 2 from a technology and consumer stand point, however everyone seems to be pinning their hopes on this technology when there hasn't even been a working demo of DX9 at the same speed as NV/AMD, never mind DX10 or DX11.
ssj4Gogeta - Sunday, February 1, 2009 - link
You're right. But I'm really excited. It would be so nice if Intel can really pull off such feat. we'll have 2 TFLOPS of general purpose parallel processing power!bobvodka - Saturday, January 31, 2009 - link
The problem is, the consoles is where the money is. Combine that with a fixed hardware platform (well, hard drives and different screen sizes not withstanding) it makes for a much easier time for devs.piroroadkill - Saturday, January 31, 2009 - link
That's definitely a good point. Any game engine these days has to support the major consoles for it to be successful - even if that only means the 360, you're still hamstrung by DirectX9, regardless of whether XP has it or not.From a personal point of view I'm still running XP because I run a clusterfuck of graphics cards that vista would shit the bed thinking about.
William Gaatjes - Saturday, January 31, 2009 - link
"Many under-the-hood enhancements mean higher performance for features available but less used under DX10. "I must be remembering it wrong but the same thing was sad about dx10. However it turned out to be nothing more then getting people to buy vista. I wonder how much will come true this time.