Scanout and the Display
Alright. So depending on the game, we are up to somewhere between 13ms and 58ms after our mouse was moved. The GPU just finished rendering and swapped the finished frame to the front buffer. What happens next is called scanout: the frame is sent out the DVI-I port over the cable and to the monitor.
If our monitor's refresh rate is 60Hz (as is typical these days), it will actually take something like 16ms to send the full frame to the monitor (plus there's about half a millisecond of "blanking" between frames being sent) giving us 16.67ms of transmission delay. In this case we are limited by the bandwidth capabilities of DVI, HDMI and DisplayPort and the timing standards put forth by VESA. So to send a full frame of anything to the display we will have 16.67ms of input lag added. Some monitors will display this data as it is received, but others will latch input meaning the full frame must be sent before it can be displayed (but let's not get too far ahead of ourselves). Either way, we will consider the latency of this step to be at least one frame (as the monitor will still take 16ms to draw the image either way).
So now we need to talk about vsync. Let's pretend we aren't using it. Let's pretend our game runs at a rock solid exact 60 FPS and our refresh rate is 60Hz, but the buffer swap happens half way between each vertical sync. This means every frame being scanned out would be split down the middle. The top half of the frame will be an additional 16.67ms behind (for a total of 33.3ms of lag). Of course, the bottom half, while 16.67ms newer than the top, won't have it's own top half sent until the next frame 16.67ms later.
In this particular case, the way the math works out if we average the latency of all the pixels on a split frame we would get the same average latency as if we enabled vsync. Unfortunately, when framerate is either higher or lower than refresh rate, vsync has the potential to cause tons of problems and this equivalence doesn't carry in the least.
If our frametime is just longer than 16.67ms with vsync enabled, we will add a full additional frame of latency (with no work being done on the GPU) before we are able to swap the finished buffer to the front for scanout. The wasted work can cause our next frame not to come in before the next vsync, giving us up to two frames of latency (one because we wait to swap and one because of the delay in starting the next frame). If our framerate is higher than 60 FPS, our GPU will have to stop working after rendering until the next vsync. This is a waste of resources and decreases overall performance, but definitely not by as much as if we use vsync at less than the monitor refresh. The upper limit of additional delay is 16.67ms minus frametime (less than one frame) rather than two full frames.
When framerate is lower than refresh rate, using either a 1 frame flip queue with vsync or triple buffering will allow the graphics hardware to continue doing rendering work while adding between 0 and 16.67ms of additional latency (the average will be between the two extremes). So you get the potential benefits of vsync (no tearing and synchronization) without the additional decrease in performance that occurs when no work gets done on the GPU. At framerates higher than refresh rate, when using a render queue, we do end up adding an additional frame of latency per number of frames we render ahead, so this solution isn't a very good one for mitigating input latency (especially in twitch shooters) in high framerate games.
Once the data is sent to the monitor, we've got more delay in store.
We've already mentioned that some LCDs latch the entire frame before display. Beyond this delay, some displays will perform image processing on the input (including scaling if this is not done on the graphics hardware). In some cases, monitors will save two frames to overdrive LCD cells to get them to respond faster. While this can improve the speed at which the picture on the monitor changes, it can add another 16.67ms to 33.3ms of latency to the input (depending on whether one frame is processed or two). Monitors with a game mode or true 120Hz monitors should definitely add less input lag than monitors that require this sort of processing.
Add, on top of all this, the fact that it will take between 2ms and 16ms for the pixels on the LCD to actually switch (response time varies between panels and depending on what levels the transition is between) and we are done: the image is now on the screen.
So what do we have total after the image is flipped to the front buffer?
One frame of lag for transmission (to display a full frame), up to 1 frame of lag if we enable triple buffering (or 1 frame render ahead and we run at less than refresh rate), up to two frames of lag if we just turn on vsync, at framerates higher than the refresh rate we we'll add an additional frame of lag for every frame we render ahead with vsync on, and zero to 2 frames of lag for the monitor to display the image (if it does extensive image processing).
So after crazy speed from the mouse to the front buffer, here we are waiting ridiculous amounts of time to get the image to appear on the screen. We add at the very very least 16.67ms of lag in this stage. At worst we're taking on between 66.67ms and 83.3ms which is totally unacceptable. And that's after the computer is completely done working on the image.
This brings our totals up to about 33ms to 80ms input lag for typical cases. Our worst case for what we've outlined, however, is about 135ms of latency between mouse movement and final display which could be discernible and might start to feel mushy. Sometimes game developers stray a bit and incur a little more input lag than is reasonable. Oblivion and Fallout 3 come to mind.
But don't worry, we'll take a look at some specific cases next.
85 Comments
View All Comments
PrinceGaz - Friday, July 17, 2009 - link
Just to add"For input lag reduction in the general case, we recommend disabling vsync"
It is rather ironic that you used that phrase, when in the your previous article you were strongly stating the case for v-sync always being used (preferably with triple-buffering).
Unless you are certain that nVidia's OpenGL implementation of triple-buffering works how you think it does (and not how most people think it does), posting articles may be unwise.
DerekWilson - Sunday, July 19, 2009 - link
In the "general case" we mean /when triple buffering is not available/ (real triple buffering is not available in the majority of games), in order to reduce input lag, vsync should be disabled.Here's a better break down of what we recommend:
Above 60FPS:
triple buffering > no vsync > vsync > flip queue (any at all)
Always EXACTLY 60 FPS (very unlikely to be perfectly consistent naturally)
triple buffering == vsync == flip queue (1 frame) > no vsync
Below 60FPS (non-twitch shooter):
triple buffering == flip queue (1 frame) > no vsync > vsync
Below 60FPS (twitch shooter or where lag is a big issue):
no vsync >= triple buffering == flip queue (1 frame) > vsync
... to us the fact that at less than 60 FPS you start on the exact same frame with either triple buffering, no vsync or with a 1 frame flip queue is enough -- you will get less than 1 frame of difference in the lag and the difference you get with no vsync is only on part of the frame anyway.
...
And ...
NVIDIA has absolutely CONFIRMED that they do OpenGL triple buffering in the way that I described it in the previous article. This confirmation is directly from their OGL driver team.
I was just able to sit down with AMD two days ago and, while they don't do things in exactly the way I describe them, their OpenGL triple buffering does do something quite interesting at > 60FPS ... which I'll talk about in an upcoming article. They also said they are looking into what it would take to do what I was talking about -- but that they haven't yet because their workstation customers tend to care more about what happens at < 60fps (which is a case where a 1 frame flip queue == triple buffering).
The situation with DirectX is a little more restrictive with both NVIDIA and AMD... I will also address this issue in an upcoming article.
BoFox - Friday, July 17, 2009 - link
Anandtech does yet another ground-breaking article, thanks to Derek Wilson and the use of a high-speed camera this time!It is rather interesting how you mention the human reaction time! To detect and process the image signal in your brain, and to react to it (usually faster if through reflex), are all part of the entire reaction time of 200-300 ms, correct? Well, we should be testing it on the professional gamers like Jonathan 'Fatal1ty' Wendel, to see how much he has tuned his reaction time in making it so reflexive that it's as low as 150ms or even lower. Why not try interviewing him and use your high-speed camera to test that on him, and on an average gamer?!? That would be a SUPER-awesome article here to read!!! :D
About the rendering ahead, some games are designed to render ahead by as much as 3 frames, and changing that to 0 could incur a large performance hit--sometimes by as much as 50%. I have experienced this with some games a while back, cannot remember which ones exactly, and that was on a single video card (not when I was using SLI). The performance penalty would usually bring down the frame rates to below the refresh rate, thus making it very undesirable. There are some other articles cautioning against it, but as long as it does not make a difference in the game that you are playing, then great. At least, as long as it does not drop the frame rates to below the refresh rate, then great!
DerekWilson - Sunday, July 19, 2009 - link
If your framerate is above refresh rate (except for the multiGPU case) you should definitely not use anything more than 0 frames of render ahead. Triple buffering as I described in my previous article (and as NVIDIA does in OpenGL) will be a good option, but double buffered w/ vsync or no vsync will definitely be better than any sort of render ahead.If your performance is below refresh rate, you'll want to either use no vsync, exactly 1 frame render ahead, or triple buffering. double buffered vsync (0 flip queue/pre-rendered frames with vsync) will give you a significant drop in performance and increase in input lag when performance is below refresh. This is as you say -- by as much as 50%. But only when performance is already below refresh.
nvmarino - Friday, July 17, 2009 - link
Hey Derek, thanks for the great article. Nice job addressing many of the BS theories and comments about lag going on in the threads of your triple buffering article...And now I've got even more fuel for the fire of my lust for someone to release a 1920x1080 resolution display with support for 120Hz refresh rate at the INPUT and good black levels. If only I could set my PC to spit out 120Hz to my display, my 24Fps Blu-ray and 60Fps TV content would play without judder, AND my games would have zero tearing (well as long as they don't run above 120Fps) and minimal lag. Not only that, but I'd also have an ideal setup for shutter-based steroscopic 3d, and it would open the door for some of the new stuff like smoothing video via frame interpolation of 24P content to 120P currently going on in the PC videophile community. Can you say HTPC nirvana! Where's my 120Hz!!!!
DerekWilson - Friday, July 17, 2009 - link
yeah, i want a true 120hz 1920x1080 display too :-) mmmm that would be tasty.Belinda fox - Friday, July 17, 2009 - link
If someones reaction times are up to 220ms and your results are around the 100ms mark surely this says something about the validity of results and conclusion.Sorry i never read all the article just first and last one, due to reaction times being mentioned as part of lag.
BoFox - Friday, July 17, 2009 - link
Just read the article first.If you had better common sense, you would figure out that the human response time of >200ms comes AFTER seeing what is on the monitor, which takes 100ms to display in the first place.
Let's say that I appear just around the corner right in front of you, in a FPS game, like Unreal Tournament 3, for example. First, it takes like 50ms ping time for your computer to receive that information. Second, it takes your monitor 100ms to display the information. Third, it takes you 200ms to react to what you're seeing.
lyeoh - Sunday, July 19, 2009 - link
And AFTER you react to what you are seeing you press a key/button or move your mouse and the relevant portion of the input lag takes into effect (input device, game, network).With a 100+ms input lag handicap even on the LAN you could be "dead" before you even see anything or be able to do anything about it.
So would be good if have some figures on wireless keyboards and mice vs wired (usb and ps2).
I personally think wireless stuff will add a lot more lag due to the modulation/demodulation, and other "tricks" they have to do.
DerekWilson - Friday, July 17, 2009 - link
That is definitely an issue ...i didn't get into network play as it gets really sticky though ... the local client does do limited prediction based on last known info about other players -- if this prediction data is accurate enough at small intervals and if ping isn't bad then it isn't a huge issue ... but at the same time, servers may resolve hits (as apparent to the client) as hits even if they are misses (in reality as per what the player did) or it may not (the difference results in either increased or reduced effectiveness of techniques like bunny hopping). but it is way more complex and depends on how the client predicts things between server updates and how the server resolves differences between clients and all sorts of funky stuff.
that's why i wanted to stick with mouse-to-display issues here ...
but you are right that there are other factors (including human reaction time) that add up to make it so input lag can be an important slice of a whole chain of events and can affect the gaming experience.