Digging Deeper: Galloping Horses Example
Rather than pull out a bunch of math and traditional timing diagrams, we've decided to put together a more straight forward presentation. The diagrams we will use show the frames of an actual animation that would be generated over time as well as what would be seen on the monitor for each method. Hopefully this will help illustrate the quantitative and qualitative differences between the approaches.
Our example consists of a fabricated example (based on an animation example courtesy of Wikipedia) of a "game" rendering a horse galloping across the screen. The basics of this timeline are that our game is capable of rendering at 5 times our refresh rate (it can render 5 different frames before a new one gets swapped to the front buffer). The consistency of the frame rate is not realistic either, as some frames will take longer than others. We cut down on these and other variables for simplicity sake. We'll talk about timing and lag in more detail based on a 60Hz refresh rate and 300 FPS performance, but we didn't want to clutter the diagram too much with times and labels. Obviously this is a theoretical example, but it does a good job of showing the idea of what is happening.
First up, we'll look at double buffering without vsync. In this case, the buffers are swapped as soon as the game is done drawing a frame. This immediately preempts what is being sent to the display at the time. Here's what it looks like in this case:
Good performance but with quality issues.
The timeline is labeled 0 to 15, and for those keeping count, each step is 3 and 1/3 milliseconds. The timeline for each buffer has a picture on it in the 3.3 ms interval during which the a frame is completed corresponding to the position of the horse and rider at that time in realtime. The large pictures at the bottom of the image represent the image displayed at each vertical refresh on the monitor. The only images we actually see are the frames that get sent to the display. The benefit of all the other frames are to minimize input lag in this case.
We can certainly see, in this extreme case, what bad tearing could look like. For this quick and dirty example, I chose only to composite three frames of animation, but it could be more or fewer tears in reality. The number of different frames drawn to the screen correspond to the length of time it takes for the graphics hardware to send the frame to the monitor. This will happen in less time than the entire interval between refreshes, but I'm not well versed enough in monitor technology to know how long that is. I sort of threw my dart at about half the interval being spent sending the frame for the purposes of this illustration (and thus parts of three completed frames are displayed). If I had to guess, I think I overestimated the time it takes to send a frame to the display.
For the above, FRAPS reported framerate would be 300 FPS, but the actual number of full images that get flashed up on the screen is always only a maximum of the refresh rate (in this example, 60 frames every second). The latency between when a frame is finished rendering and when it starts to appear on screen (this is input latency) is less than 3.3ms.
When we turn on vsync, the tearing goes away, but our real performance goes down and input latency goes up. Here's what we see.
Good quality, but bad performance and input lag.
If we consider each of these diagrams to be systems rendering the exact same thing starting at the exact same time, we can can see how far "behind" this rendering is. There is none of the tearing that was evident in our first example, but we pay for that with outdated information. In addition, the actual framerate in addition to the reported framerate is 60 FPS. The computer ends up doing a lot less work, of course, but it is at the expense of realized performance despite the fact that we cannot actually see more than the 60 images the monitor displays every second.
Here, the price we pay for eliminating tearing is an increase in latency from a maximum of 3.3ms to a maximum of 13.3ms. With vsync on a 60Hz monitor, the maximum latency that happens between when a rendering if finished and when it is displayed is a full 1/60 of a second (16.67ms), but the effective latency that can be incurred will be higher. Since no more drawing can happen after the next frame to be displayed is finished until it is swapped to the front buffer, the real effect of latency when using vsync will be more than a full vertical refresh when rendering takes longer than one refresh to complete.
Moving on to triple buffering, we can see how it combines the best advantages of the two double buffering approaches.
The best of both worlds.
And here we are. We are back down to a maximum of 3.3ms of input latency, but with no tearing. Our actual performance is back up to 300 FPS, but this may not be reported correctly by a frame counter that only monitors front buffer flips. Again, only 60 frames actually get pasted up to the monitor every second, but in this case, those 60 frames are the most recent frames fully rendered before the next refresh.
While there may be parts of the frames in double buffering without vsync that are "newer" than corresponding parts of the triple buffered frame, the price that is paid for that is potential visual corruption. The real kicker is that, if you don't actually see tearing in the double buffered case, then those partial updates are not different enough than the previous frame(s) to have really mattered visually anyway. In other words, only when you see the tear are you really getting any useful new information. But how useful is that new information if it only comes with tearing?
184 Comments
View All Comments
Schmide - Friday, June 26, 2009 - link
Nice analysis."In my explanation I'm going to refer to any buffer swapping as copying from one buffer to another; how it is implemented by the hardware is irrelevant."
I think you have to elaborate on what's a copy and what's a swap as they are very different operations. Copying locks both surfaces preventing the use of both and takes processing power, while swapping locks nothing and takes no processing power.
In your explanation, you would have very different results if it was a copy or a swap.
PrinceGaz - Friday, June 26, 2009 - link
I understand if the card is actually copying blocks of memory between fixed buffers that it would have a performance impact. Whether it is copying or swapping is irrelevant as to what I was trying to explain, which is why I said how it is implemented by the hardware isn't relevant. It's best just to assume that in reality it is swapping buffers, which is equivalent to an instantaneous copy/move.The only thing is I've now realised there is a third way triple-buffering could work, which is roughly halfway between the two methods I proposed; if both B and C have been filled at the vertical-refresh meaning the card has been stalled, B is discarded, C is moved to A to be displayed, and the card now starts rendering to B, then to C. Again that makes no difference when the framerate is below the refresh-rate but it allows the card to render at up to double the refresh-rate to reduce the lag to a minimum of one instead of two refreshes when conditions allow.
Schmide - Friday, June 26, 2009 - link
Ok I get it.3 buffers A, B and C all fully renderable. The rendering goes as so:
A is rendered and queued to become the primary on the refresh after it's completion.
B begins rendering right after A is completed then queued to become primary the refresh after A or the next refresh after B completes.
C begins rendering after B finishes rendering and queues as B was queued to A.
repeat.
The advantage being, when vsinc is on, instead of starting the rendering of the next surface after the swap, you start a rendering immediately on 3rd surface rather than wait for the current primary to become available after the swap.
Makes sense.
DerekWilson - Saturday, June 27, 2009 - link
This is a good explanation of render ahead ... it's different than using triple buffering for page flipping.PrinceGaz - Friday, June 26, 2009 - link
ooops, made a slight error in my lag calculations when framerate can exceed refresh-rate.Two frames ago (my method) is correct- 0.033 seconds always because that is how long two refreshes take at 60hz.
Your method (constant back-buffer updating) will be between one and two frames ago, not zero to one frames like I said, so at 100fps the lag would actually be between 0.010 and 0.020 seconds, not 0 and 0.010 seconds like I said earlier.
Sorry about that.
DerekWilson - Friday, June 26, 2009 - link
Actually, this is precisely why I wanted to write this article.The technique you describe here:
"The way I understand triple-buffering works is that once B and C both have frames rendered to them waiting to be displayed, the graphics-card then pauses until the vertical-refresh, at which point B is copied to A to be displayed, C is moved to B, and the card is free to start work on rendering a new frame to fill the now empty C. No frames are thrown away, and the card is not constantly churning out frames which won't be displayed."
While it uses 3 buffers, is actually called render ahead. I'm talking about a page flipping method (which can actually be combined with render-ahead, but that's beyond the scope of this piece).
The benefit of the DirectX render ahead approach (which can be up to 8 frames iirc), is that it incurs a higher potential latency in order to achieve much smoother action.
If my framerate fluctuates a lot between high and low rates, it's possible that the game will feel "jerky." But using render ahead, I can essentailly cache up to X (the default is 3) quickly rendered frames so that the next long frametime I hit doesn't cause the monitor to keep showing the exact same image for multiple frames while it waits -- instead, if i have my 3 frames rendered ahead, if the frametime of my next frame is anything up to 50ms, my rendered ahead frames can be spat out, one after the other, sequentially until my 4th frame is ready. if frame rate goes back up after this, then i've successfully smoothed things out.
the price is, of course, that high framerate means we will see lag because no frames are dropped.
but this is not triple buffering.
DirectX does not support actual triple buffering out of the box -- it has to be programmed by the developer. OpenGL does actually support triple buffering inherently as I described.
I promise that the description I gave in the article is what triple buffering is supposed to be. calling render ahead triple buffering because the default uses 3 buffers has definitely caused a lot of confusion (especially when the wikipedia page cites this as a reason that render ahead is "an implementation" of triple buffering ... which is disingenuous in my mind).
we don't, afterall, call anything that uses 2 buffers double buffering -- like if I use 2 MRTs while I'm rendering something, am I double buffering then? in the same sense that 3 frame render ahead is triple buffering then sure ... but not if we are talking about page flipping.
... so ...
also, the lag between 0 and 1 frames that i was talking about is lag between the END of rendering the frame and when it is displayed. If frametime is taken into account, the input that generated that frame will have happened, at a maximum of (frametime + 16.67ms) ... so it could be longer ago than just one frame but not /because/ of triple buffering.
GreyMulkin - Friday, June 26, 2009 - link
VSYNC for life!This article has convinced me that I only want to use double buffering + vsync. I despise tearing - your example of "only seeing it on fast turns" is disingenuous. When vsync is off, even games with simple graphics tear, everywhere. I find it distracting and undesirable. But I also don't want the input lag associated with being 1 more frame away from the action.
"While enabling vsync does fix tearing, it also sets the internal framerate of the game to, at most, the refresh rate of the monitor (typically 60Hz for most LCD panels)."
I see no problem with this. I'd much rather have the game running near a constant framerate than to have it bounce back and forth between high and low performance. Consistency is what I want.
"This can hurt performance even if the game doesn't run at 60 frames per second as there will still be artificial delays added to effect synchronization. Performance can be cut nearly in half cases where every frame takes just a little longer than 16.67 ms (1/60th of a second). In such a case, frame rate would drop to 30 FPS despite the fact that the game should run at just under 60 FPS."
Well, if your frames are taking longer than 16.67ms to render, then you're not actually rendering 60 fps. Duh! Longer rendering times mean lower framerates. If you don't like the framerate you're seeing, turn down some settings (resolution, quality, AA, AF, etc). Triple buffering won't fix the problem of render times which are too long for the desired framerate.
GreyMulkin - Friday, June 26, 2009 - link
To elaborate on the "while enabling vsync..." thing. Vsync off usually makes the game try to run as fast (high fps) as possible which is generally recognized as a good thing. But because scenes differ in complexity, the frames rendered per second will vary wildy and that will affect input processing which is usually tied directly to fps. So what I don't want is the *feel* of the game, the smoothness of mouse movements, etc, to be affected.MadMan007 - Friday, June 26, 2009 - link
Depends upon the game. For online competitive FPS I disable vsync, for single player games or less twitchy ones I enable it. There's no poll option for this.TonyB - Friday, June 26, 2009 - link
I still use my 21" CRT, i run at 100hz refresh rate with vsync on w/ triple buffering.at 100hz, your input lag is only 10ms (1/100) compared to 16.6ms (1/60) not to mention i'm capped at 100 fps instead of 60fps.
this is why i'm still using CRT instead of LCD.