Parsing Input in Software and the CPU Limit
Before we get into software, for the sake of sanity, we are going to ignore context switching and we'll pretend that only the operating system kernel and the game are running and always get processor time exactly when they need it for as long as its needed (and never need it at the same time). In real life desktop operating systems, especially on single core processors, there will be added delay due to process scheduling between our game and other tasks (which is handled by the operating system) and OS background tasks. These delays (in extreme cases called starvation) can be somewhere between a handful of nanoseconds or on the microsecond level on modern systems depending on process prioritization, what else is happening, and how the scheduler is implemented.
Once the mouse has sent its report over USB to the PC and the USB root hub receives the data, it is up to the OS (for our purposes, MS Windows) to handle the data next. Our report travels from the USB root hub over the system bus (southbridge through the north bridge to the CPU takes +/- some nanoseconds depending on load), is put on an input stack (in this case the HID (Human Interface Device) stack), and a Windows OS message (WM_INPUT) is generated to let any user space software monitoring raw mouse input know that new data has arrived. Software written to take full advantage of hardware will handle the WM_INPUT message by reading the appropriate data directly from the HID stack after it gets the message that data is waiting.
This particular part of the process (checking windows messages and handling the WM_INPUT message) happens pretty fast and should be on the order of microseconds at worst. This is a hard delay to track down, as the real time this takes is dependent on what the programmer actually does. Latencies here are not guaranteed by either the motherboard chipset or Windows.
Once the software has the data (after at least 1ms and some microseconds in change), it needs to do something with it. This is hugely variable, as developers can choose to implement doing something with input at any of a number of points in the process of updating the game state for the next frame. The thing that makes the most sense to me would be to run your AI based on the previous input data, step through any scripted actions, update physics per object based on last state and AI decisions, then get user data and update player state/physics based on previous state and current input.
There are cases or design decisions that may require getting user input before doing some of these other tasks, so the way I would want to do it might not be practical. This whole part of the pipeline can be quite long as highly intelligent AI and immersive physics (along with other game scripting and state updates) can require massive amounts of work. At the least we have lots of sorting, branching, and necessarily serial computations to worry with.
Depending on when input is collected and the depth and breadth of the simulation, we could see input lag increase up to several milliseconds. This is highly game dependent, but it isn't something the end user has any control over outside of getting the fastest possible CPU (and this still won't likely change things in a perceivable way as there are memory and system latencies to consider and the GPU is largely the bottleneck in modern games). Some games are designed to be highly responsive and some games are designed to be highly accurate. While always having both cranked up to 11 would be great, there are trade offs to be made.
Unfortunately, that leaves us with a highly variable situation. The only way to really determine the input lag caused by game code itself is profile the code (which requires access to the source to be done right) or ask a developer. But knowing the specifics aren't as necessary as knowing that there's not much that can be done by the gamer to mitigate this issue. For the purposes of this article, we will consider game logic to typically add somewhere between 1ms and 10ms of input lag in modern games. This considers things like decoupling simulation and AI threads from rendering and having work done in parallel among other things. If everything were done linearly things would very likely take longer.
When we've got our game state updated, we then setup graphics for rendering. This will involve using our game state to update geometry and display lists on the CPU side before the GPU can start work on the next frame. The speed of this step is again dependent on the implementation and can take up a good bit of time. This will be dependent on the complexity of the scene and the number of triangles required. Again, while this is highly dependent on the game and what's going on, we can typically expect something between 1ms and 10ms for this part of the process as well if we include the time it takes to upload geometry and other data to the GPU.
Now, all the issues we've covered on this page go into making up a key element of game performance: CPU time. The total latency from front to back in this stage of a game engine creates a CPU limit on performance. When what comes after this (rendering on the GPU) takes less time than everything up to this point, we have hit the CPU limit. We can typically see the CPU limit when we drop resolution down to something ridiculously low on a high end card without seeing any real performance gain between that and the next highest resolution.
From the examples I've given here, if both the game logic and the graphics/geometry setup come in at the minimum latencies I've suggested should be typical, we could be CPU limited at as much as 500 frames per second. On the flip side, if both portions of this process push up to the 10ms level, we would never see a frame rate over 50 FPS no matter how fast the GPU rendered anything.
Obviously there is variability in games, and sometimes we see a CPU limit at less than 60 FPS even at the lowest resolution on the highest end hardware. Likewise, we can see framerates hit over 2000 FPS when drawing a static image (where game logic and display lists don't need to be updated) with a menu in front of it (like when a user hits escape in Oblivion with vsync off). And, again, multi-threaded software design on multi-core CPUs really middies up the situation. But this is near enough to illustrate the point.
And now it's on to the portion of realtime 3D graphics that typically incurs the most input lag before we leave the computer: the graphics hardware.
85 Comments
View All Comments
Zolcos - Thursday, July 16, 2009 - link
The article is logically inconsistent. On page 1 it states "input lag is defined as the delay between the when a user does something with an input device and when that action is reflected on the monitor" and on page two it has "Input lag starts from before we even react".DerekWilson - Thursday, July 16, 2009 - link
i'll fix that..."The impact of input lag is compounded by what goes on before we even react."
yacoub - Thursday, July 16, 2009 - link
The input lag everyone's most concerned with is the amount the display adds, because while all the rest is consistent, displays add a variable amount depending on which one you get. The ones that add more than ~20 ms add a NOTICEABLE amount (for most people) which takes input lag to the point that it becomes frustrating.DerekWilson - Thursday, July 16, 2009 - link
Part of the point was to explain that there is a lot at the end of the chain that can significantly impact performance and it's all about the display.If we do consider a 100ms threshold as valid, then based on our numbers from TF2 it is clear that we would end up in the >100ms input lag range with a monitor that adds more than 20ms of lag.
And if we can't expect a twitch shooter to come in under the mark, how is everything else going to do? Not well I would imagine.
I did think about looking at a wide array of monitors, but I feel like that might be better suited to a more focused review of monitor performance rather than an exploration of input lag in general.
yacoub - Thursday, July 16, 2009 - link
Sure but for whatever reason, all of the lag prior to the display's lag is essentially transparent because it doesn't add up to be enough to be perceptible. This would equate to your threshold.When using a display with little or no noticeable display lag, any FPS game will feel very responsive and without discernible latency (assuming your GPU hardware is up to the task of rendering the frames quickly enough and you're not using one of the early optical mice from a decade ago that had terrible tracking refresh rates, etc etc).
Yet simply switching to a display with higher latency is enough to make input latency noticeable and frustrating for FPS gamers. So the key issue is finding a TN or IPS display since those panel technologies have the least input lag. Of course most panels out there are -VA based panels because they are cheaper to produce than IPS, and TN may be snappy in display response but they have a number of other downsides.
What matters most is getting panel makers focused on IPS-based displays (or new panel technologies that significantly reduce the input lag most non-TN displays presently suffer. And hey, the more they produce and sell, the lower the production cost per unit so the better the pricing can be and the more opportunities for improved technology to be added to the IPS design.
ocyl - Friday, July 17, 2009 - link
@ yacoubDid you read the article at all?
yacoub - Friday, July 17, 2009 - link
Yes. I must not be explaining myself well, so forget it.DDuckMan - Saturday, December 18, 2010 - link
While this article was great, I'm still not sure if I am better off disabling SLI to eliminate the syncronization lag or having the higher framerates with SLI enabled in twitch games. It seems to me that with 120Hz monitors, vsync (which I need for 3D) and SLI lag would not be as important as keeping the framerate above the monitor refresh rate. I don't have the equipment to properly test, so I am looking forward the the next article.http://hardforum.com/showthread.php?t=1569281
burner1980 - Thursday, March 10, 2011 - link
Quote: "Input lag with multiGPU systems is something we will want to explore at a later time."I`m still waiting patiently and looking forward to a follow up investigation. The topic of input lag is VERY important to gamers who play FPS. I do notice it in racing games, too.
I suggest to use true 120Hz monitors in the follow up article. They of course won`t reduce input lag, but help to reduce screen tearing and thus allowing to optimize one`s settings to reduce input lag while keeping screen tearing at a low enough level.
I´m also courious if using a 3 screen setup a la Eyefinity oder Vision Surround using two GPUs will have an impact.
dmnwlv - Thursday, April 28, 2011 - link
Impressive report.Regarding mouse polling rate (I may have missed it out):
1) I believe the actual mouse input into the CPU is already calculated and the end result (of that action) already registered before you get to see it on screen. It does not wait for the GPU/monitor to finish processing before determining the end result. Hence the influence of mouse response is even more substantial if we take out the whole chunk of lag times that were included in the total lag calculation here - Derek Wilson, pls correct me if I am wrong.
Coupled with the predictive ability of human (also reported here) to react accordingly in advance from the existing state of game situation, it seems to match and explain why it is hard to imagine a few milliseconds of difference in mouse lag can have an impact to the overall gaming experience. The brain and reaction is (trying its best) interpolating and working in tandem with the CPU than the monitor.
2) And another scenario where the user already intended to do a series of / continuous / extended action (eg, drawing a long curve line), does the response rate of the mouse play a part in drawing the most accurate curve that the person input/intended? - Maybe Derek can help on this as well.
Thanks for the great report.