Scanout and the Display
Alright. So depending on the game, we are up to somewhere between 13ms and 58ms after our mouse was moved. The GPU just finished rendering and swapped the finished frame to the front buffer. What happens next is called scanout: the frame is sent out the DVI-I port over the cable and to the monitor.
If our monitor's refresh rate is 60Hz (as is typical these days), it will actually take something like 16ms to send the full frame to the monitor (plus there's about half a millisecond of "blanking" between frames being sent) giving us 16.67ms of transmission delay. In this case we are limited by the bandwidth capabilities of DVI, HDMI and DisplayPort and the timing standards put forth by VESA. So to send a full frame of anything to the display we will have 16.67ms of input lag added. Some monitors will display this data as it is received, but others will latch input meaning the full frame must be sent before it can be displayed (but let's not get too far ahead of ourselves). Either way, we will consider the latency of this step to be at least one frame (as the monitor will still take 16ms to draw the image either way).
So now we need to talk about vsync. Let's pretend we aren't using it. Let's pretend our game runs at a rock solid exact 60 FPS and our refresh rate is 60Hz, but the buffer swap happens half way between each vertical sync. This means every frame being scanned out would be split down the middle. The top half of the frame will be an additional 16.67ms behind (for a total of 33.3ms of lag). Of course, the bottom half, while 16.67ms newer than the top, won't have it's own top half sent until the next frame 16.67ms later.
In this particular case, the way the math works out if we average the latency of all the pixels on a split frame we would get the same average latency as if we enabled vsync. Unfortunately, when framerate is either higher or lower than refresh rate, vsync has the potential to cause tons of problems and this equivalence doesn't carry in the least.
If our frametime is just longer than 16.67ms with vsync enabled, we will add a full additional frame of latency (with no work being done on the GPU) before we are able to swap the finished buffer to the front for scanout. The wasted work can cause our next frame not to come in before the next vsync, giving us up to two frames of latency (one because we wait to swap and one because of the delay in starting the next frame). If our framerate is higher than 60 FPS, our GPU will have to stop working after rendering until the next vsync. This is a waste of resources and decreases overall performance, but definitely not by as much as if we use vsync at less than the monitor refresh. The upper limit of additional delay is 16.67ms minus frametime (less than one frame) rather than two full frames.
When framerate is lower than refresh rate, using either a 1 frame flip queue with vsync or triple buffering will allow the graphics hardware to continue doing rendering work while adding between 0 and 16.67ms of additional latency (the average will be between the two extremes). So you get the potential benefits of vsync (no tearing and synchronization) without the additional decrease in performance that occurs when no work gets done on the GPU. At framerates higher than refresh rate, when using a render queue, we do end up adding an additional frame of latency per number of frames we render ahead, so this solution isn't a very good one for mitigating input latency (especially in twitch shooters) in high framerate games.
Once the data is sent to the monitor, we've got more delay in store.
We've already mentioned that some LCDs latch the entire frame before display. Beyond this delay, some displays will perform image processing on the input (including scaling if this is not done on the graphics hardware). In some cases, monitors will save two frames to overdrive LCD cells to get them to respond faster. While this can improve the speed at which the picture on the monitor changes, it can add another 16.67ms to 33.3ms of latency to the input (depending on whether one frame is processed or two). Monitors with a game mode or true 120Hz monitors should definitely add less input lag than monitors that require this sort of processing.
Add, on top of all this, the fact that it will take between 2ms and 16ms for the pixels on the LCD to actually switch (response time varies between panels and depending on what levels the transition is between) and we are done: the image is now on the screen.
So what do we have total after the image is flipped to the front buffer?
One frame of lag for transmission (to display a full frame), up to 1 frame of lag if we enable triple buffering (or 1 frame render ahead and we run at less than refresh rate), up to two frames of lag if we just turn on vsync, at framerates higher than the refresh rate we we'll add an additional frame of lag for every frame we render ahead with vsync on, and zero to 2 frames of lag for the monitor to display the image (if it does extensive image processing).
So after crazy speed from the mouse to the front buffer, here we are waiting ridiculous amounts of time to get the image to appear on the screen. We add at the very very least 16.67ms of lag in this stage. At worst we're taking on between 66.67ms and 83.3ms which is totally unacceptable. And that's after the computer is completely done working on the image.
This brings our totals up to about 33ms to 80ms input lag for typical cases. Our worst case for what we've outlined, however, is about 135ms of latency between mouse movement and final display which could be discernible and might start to feel mushy. Sometimes game developers stray a bit and incur a little more input lag than is reasonable. Oblivion and Fallout 3 come to mind.
But don't worry, we'll take a look at some specific cases next.
85 Comments
View All Comments
aguilpa1 - Thursday, July 16, 2009 - link
Lots of variables that we never consider when trying to do fast gaming. I would be curious how much lag is in a racing sim like GRID or Colin McCrae DIRT. Those are intense graphics games and demand the fastest everything to keep you from going in the ditch. I have noticed I compensate at times by estimating when to start turning before the turn arrives.crimson117 - Thursday, July 16, 2009 - link
Isn't part of that just intentional skidding / drift in racing games, to mimic the "lag" of rubber catching asphalt at high speeds?hechacker1 - Thursday, July 16, 2009 - link
You say TF2 is GPU limited, but with my 4850 I find the first core is pegged at 100%. The same applies to my older 3850.With core i7 920 @ 166x20 = 3320MHz and +166 for Turbo mode, hyper threading on, I see TF2 using 6 cores, The first is pegged out at 100%, the second and third vary from 50-100% depending on the action (32 player server). The other three hover around 25%.
1920x1080. Benq 2400G (bought for its low input-lag)
All highest settings, 4xMSAA, Aniso 8x, Disable vsync, FOV 90
My framerate hovers around 100FPS for most Valve maps.
I use this autoexec to get more threading and higher quality textures:
rlod 0 matpicmip -10 clnewimpacteffects 1 mpusehwmmodels 1 mpusehwmvcds 1 clburninggibs 1 matspecular 1 matparallaxmap 1 rthreadedparticles 1 rthreadedrenderables 1 clragdollcollide "1" jpegquality 100 rthreadedclientshadow_manager 1
Most people say TF2 is a CPU limited game. Perhaps that only applies ATI?
Even without the autoexec.cfg, I see the game use 100% on the first core.
Very good article though. I hope this shuts up the false info that 60fps is too fast for humans to notice.
DerekWilson - Thursday, July 16, 2009 - link
even if a core is pegged at 100% that doesn't mean the game's performance is CPU limited.at 2560x1600 we were hovering around 110fps but at 1152x864 we were constantly well over 200 fps. As lowering resolution doesn't change the load on the CPU, this clearly indicates that we were GPU bound -- at least at 25x16.
For our 1152x864@120Hz test, we might have been CPU bound, but I don't have the data to know for sure here (I didn't test any near resolutions).
hechacker1 - Thursday, July 16, 2009 - link
Oh yeah.Flip queue to 0.
ATI A.I. at Low or "standard" (I've read "high" mode can use more CPU?)
Latest driver. Windows 7 x64 7201.
Qiasfah - Thursday, July 16, 2009 - link
In the article you stated that TF2 was GPU limited (and it was in the situations you were testing), however you should find that in battle situations with other active characters present it becomes heavily CPU limited. It would be interesting to see if there was a difference in input lag due to this in the midst of battles rather than sitting idle.I run an i7 920, and even with multicore enabled (an option which will very commonly double a persons TF2 framerate) i get the same dips in FPS regardless of graphical settings. It would be interesting to see how overclocking affects the performance of this game.
DerekWilson - Thursday, July 16, 2009 - link
More than likely, in TF2, you'll be bottlenecked at the network when it comes to performance ...But the way Valve does things is with local prediction (running code on the client) and then checking predictions on the server. This should mean that our test shows what you can expect to actually /see/ whether or not what actually /happens/ is the same (if you are very laggy on the network or if there are lots of players or whatever).
codestrong - Thursday, July 16, 2009 - link
"Beyond that, GPU is the next most important faster (factor?), and a mouse that can do at least 500 reports per second is a good idea." Nice work by the way. I've been interested in this since Carmack mentioned input lag during his work on quake live.DerekWilson - Thursday, July 16, 2009 - link
yeah, i meant factor. thanks.SiliconDoc - Tuesday, July 21, 2009 - link
Yes, nice article and nice work on getting the job done without a super expensive camera, on an interesting subject for gamers.