Digging Deeper: Galloping Horses Example
Rather than pull out a bunch of math and traditional timing diagrams, we've decided to put together a more straight forward presentation. The diagrams we will use show the frames of an actual animation that would be generated over time as well as what would be seen on the monitor for each method. Hopefully this will help illustrate the quantitative and qualitative differences between the approaches.
Our example consists of a fabricated example (based on an animation example courtesy of Wikipedia) of a "game" rendering a horse galloping across the screen. The basics of this timeline are that our game is capable of rendering at 5 times our refresh rate (it can render 5 different frames before a new one gets swapped to the front buffer). The consistency of the frame rate is not realistic either, as some frames will take longer than others. We cut down on these and other variables for simplicity sake. We'll talk about timing and lag in more detail based on a 60Hz refresh rate and 300 FPS performance, but we didn't want to clutter the diagram too much with times and labels. Obviously this is a theoretical example, but it does a good job of showing the idea of what is happening.
First up, we'll look at double buffering without vsync. In this case, the buffers are swapped as soon as the game is done drawing a frame. This immediately preempts what is being sent to the display at the time. Here's what it looks like in this case:
Good performance but with quality issues.
The timeline is labeled 0 to 15, and for those keeping count, each step is 3 and 1/3 milliseconds. The timeline for each buffer has a picture on it in the 3.3 ms interval during which the a frame is completed corresponding to the position of the horse and rider at that time in realtime. The large pictures at the bottom of the image represent the image displayed at each vertical refresh on the monitor. The only images we actually see are the frames that get sent to the display. The benefit of all the other frames are to minimize input lag in this case.
We can certainly see, in this extreme case, what bad tearing could look like. For this quick and dirty example, I chose only to composite three frames of animation, but it could be more or fewer tears in reality. The number of different frames drawn to the screen correspond to the length of time it takes for the graphics hardware to send the frame to the monitor. This will happen in less time than the entire interval between refreshes, but I'm not well versed enough in monitor technology to know how long that is. I sort of threw my dart at about half the interval being spent sending the frame for the purposes of this illustration (and thus parts of three completed frames are displayed). If I had to guess, I think I overestimated the time it takes to send a frame to the display.
For the above, FRAPS reported framerate would be 300 FPS, but the actual number of full images that get flashed up on the screen is always only a maximum of the refresh rate (in this example, 60 frames every second). The latency between when a frame is finished rendering and when it starts to appear on screen (this is input latency) is less than 3.3ms.
When we turn on vsync, the tearing goes away, but our real performance goes down and input latency goes up. Here's what we see.
Good quality, but bad performance and input lag.
If we consider each of these diagrams to be systems rendering the exact same thing starting at the exact same time, we can can see how far "behind" this rendering is. There is none of the tearing that was evident in our first example, but we pay for that with outdated information. In addition, the actual framerate in addition to the reported framerate is 60 FPS. The computer ends up doing a lot less work, of course, but it is at the expense of realized performance despite the fact that we cannot actually see more than the 60 images the monitor displays every second.
Here, the price we pay for eliminating tearing is an increase in latency from a maximum of 3.3ms to a maximum of 13.3ms. With vsync on a 60Hz monitor, the maximum latency that happens between when a rendering if finished and when it is displayed is a full 1/60 of a second (16.67ms), but the effective latency that can be incurred will be higher. Since no more drawing can happen after the next frame to be displayed is finished until it is swapped to the front buffer, the real effect of latency when using vsync will be more than a full vertical refresh when rendering takes longer than one refresh to complete.
Moving on to triple buffering, we can see how it combines the best advantages of the two double buffering approaches.
The best of both worlds.
And here we are. We are back down to a maximum of 3.3ms of input latency, but with no tearing. Our actual performance is back up to 300 FPS, but this may not be reported correctly by a frame counter that only monitors front buffer flips. Again, only 60 frames actually get pasted up to the monitor every second, but in this case, those 60 frames are the most recent frames fully rendered before the next refresh.
While there may be parts of the frames in double buffering without vsync that are "newer" than corresponding parts of the triple buffered frame, the price that is paid for that is potential visual corruption. The real kicker is that, if you don't actually see tearing in the double buffered case, then those partial updates are not different enough than the previous frame(s) to have really mattered visually anyway. In other words, only when you see the tear are you really getting any useful new information. But how useful is that new information if it only comes with tearing?
184 Comments
View All Comments
DerekWilson - Wednesday, July 1, 2009 - link
"skipping" doesn't occur in games like it does with a video -- there is not a set number of frames that must be rendered in a set amount of time. The action happens independently of the frames rendered in a game, while for a video you there is an exact framerate that needs to be maintained in order to see smooth motion as it was captured.in the old days, console games would tie the game timer to framerate which was always set to vsync. If frame rate dropped from 60 FPS to 30, the game would actually slow down (when too much action was going on on the screen). Modern PC games do not rely on framerate to time their game, in stead framerate is a snapshot of the game at a certain time.
if you drop all but the most recently completed frame, then you are just doing triple buffering the way this article describes.
ufon68 - Monday, March 28, 2016 - link
This not completely accurate.While usually, and for a good reason, physical simulation indeed "ticks" independent of FPS, the actual gameplay logic is usually tied to the frames being displayed.
You don't see the game slowing down or speeding up because the simulation takes into account the FPS speed, ie. it multiplies everything by the delta time which is the time it took to tick the last frame.
For instnance to translate an object along a vector at a certain speed, you move it every frame by: Direction * Speed * DeltaTime (DeltaTime being the time it took the game to tick the last frame in seconds, given a constant fps[just for simplifacation purposes, the fps can move up and down and this will still work] of 10, the DeltaTime is 0.1 for this given frame/tick)
So it's not correct that what you see are snapshots of what's going on in the game world, it just gives you that impression by doing this neat trick.
VinnyV - Monday, June 29, 2009 - link
I just wanted to say that I really appreciate this article. I think I just went from understanding about 10% of what is usually discussed on this site to about 11 or 12%. Thanks! Please post more articles like this!iwodo - Monday, June 29, 2009 - link
I see this as an Direct X Problem only? May be we should call Microsoft to improve on it... ( Too Late for Direct X 11?? )Dospac - Sunday, June 28, 2009 - link
Derek, it would be interesting to get to the bottom of the multi-GPU input delay issue as well as devise a quantitative way to test the delay with various setups. It's confounding that this hasn't been investigated sooner and been sorted out. The potential PQ improvement is well worth your efforts. Thank you!DerekWilson - Sunday, June 28, 2009 - link
getting to the bottom of why delay happens conceptually isn't that complex -- there are a lot of issues in interGPU communication and synchronization that can cause issues.quantitative testing is possible but pretty expensive ... i'll see if i can convince Anand to invest in the equipment :-)
DerekWilson - Sunday, June 28, 2009 - link
Added a note at the end of the article to try and help clear the air about the confusion over triple buffering as a page flipping method and flip queues (render ahead) with three buffers.I also wanted to note that this topic is not just confusing for gamers -- game developer do not always get their labeling right and sometimes refer to flip queues a "triple buffering" incorrectly.
I do apologize for not addressing this issue at publication, but I hope this helps to clear the air.
Touche - Sunday, June 28, 2009 - link
Have you seen this?http://msdn.microsoft.com/en-us/library/ms796537.a...">http://msdn.microsoft.com/en-us/library/ms796537.a...
http://msdn.microsoft.com/en-us/library/ms893104.a...">http://msdn.microsoft.com/en-us/library/ms893104.a...
DerekWilson - Wednesday, July 1, 2009 - link
What they are showing is 1 frame render ahead with vsync. In MS DX terms, this is a flip chain with 2 back buffers and a present interval of one.This is them calling it triple if uses three total buffers. This is still a flip queue and should be referred to as such to avoid confusion.
mikeev - Sunday, June 28, 2009 - link
Add me to the list of people who tried triple buffering but had to turn it OFF due to the input lag.I ran the test in L4D anyway. It was unbearable. I couldn't hit a thing. The input lag was actually noticeably less with double buffering + vsync ON.
Maybe I'm doing something wrong, but my results do not jive with this article at all.