267: FPS quality, part 1: Texture streaming
matso
Master of Patches Join Date: 2002-11-05 Member: 7000Members, Forum Moderators, NS2 Developer, Constellation, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Reinforced - Shadow, NS2 Community Developer
One focus for 267 has been in finding out why NS2 has such a low frame-time quality or why "90 fps on NS2 feels like 30".
Turns out that there are multiple parts to that problem ... and we'll take one part at a time.
Part 1: Texture streaming
First, a word about the NS2 texture streamer. It works by creating a very low-res version of every texture to be used which uses so little VRAM that it can be kept around forever. Then, when you want to load a texture, the low-res version is used until the background streamer has loaded the hi-res version from the filesystem.
Thus, no GPU stalling, and at worst a momentarily faded-out texture visible.
However, it turned out that there were a couple of problems.
First, the video cards these days reports "virtual video ram" - which is to say, they lie and pretend they have much more texture ram than they actually have. The driver will stream textures back and forth between the primary memory and the video card, thus making it look like they have more ram than they actually have.
The main problem here is that the GPU will usually stall while waiting for textures that are not in the vram. Which means you will get a spike in the frametime while the texture is streamed in to the card (and another texture also needs to be streamed out first to make room).
But if you have a graphics cards that lies about its VRAM (check r_stats; if it reports about 3500Mb of "virtual video memory free" on the starting screen you got a liar inside your computer), then the (stall-free) NS2 texture manager will never trigger.
To avoid this problem, we have added a Texture Management option to the Graphics Options. That will allow you to set the texture ram to something rather more like your actual video ram (and it also allows you to exercise the texture manager - the 0.5Gb option will force it to load/unload textures pretty much the whole time - use r_stats to track the activity).
The second problem was that the texture manager was using the device driver while the main render thread was trying to render. As the render thread is often the bottleneck, this meant that the texture manager was slowing down rendering while active.
Easy fix, just a matter of adding a flag so the render thread could tell the texture manager to step back and wait until the render was through. Does mean that the texture manager will not load textures as fast, so you will get to see more of those low-res textures sometimes, but ...
The third problem turned out to be a bit more serious. NS2 uses memory-mapped files to load files effectively. Works very well if the file you are looking for is already in the filesystem buffers (then reading a 5Mb files takes pretty much no time at all - just a matter of switching around some internal registers and presto - you have access to the file data).
The texture streamer basically takes the memory mapped file, finds out where the texture data starts and hands that address to the card device driver. Extremely fast and efficient - unless the file isn't in the filesystem buffers.
At which time you are suddenly going to be loading a texture about 10 000 times slower than expected - 100+ ms instead of 0.01ms. And you are locking up the graphics driver while doing so... so those 100+ms gets added to your frame time.
Not a good thing. Fortunately, the fix is easy - just step through the memory mapped area while you are in the background loader to ensure the whole area is loaded (as this is a background thread, it does not affect frametime), and THEN call the device driver.
In case you are wondering why this wasn't caught earlier, you will only notice it if the files you need are not in the filesystem buffers ... and if you are developing NS2 they will pretty much always be loaded.
To be continued...
Turns out that there are multiple parts to that problem ... and we'll take one part at a time.
Part 1: Texture streaming
First, a word about the NS2 texture streamer. It works by creating a very low-res version of every texture to be used which uses so little VRAM that it can be kept around forever. Then, when you want to load a texture, the low-res version is used until the background streamer has loaded the hi-res version from the filesystem.
Thus, no GPU stalling, and at worst a momentarily faded-out texture visible.
However, it turned out that there were a couple of problems.
First, the video cards these days reports "virtual video ram" - which is to say, they lie and pretend they have much more texture ram than they actually have. The driver will stream textures back and forth between the primary memory and the video card, thus making it look like they have more ram than they actually have.
The main problem here is that the GPU will usually stall while waiting for textures that are not in the vram. Which means you will get a spike in the frametime while the texture is streamed in to the card (and another texture also needs to be streamed out first to make room).
But if you have a graphics cards that lies about its VRAM (check r_stats; if it reports about 3500Mb of "virtual video memory free" on the starting screen you got a liar inside your computer), then the (stall-free) NS2 texture manager will never trigger.
To avoid this problem, we have added a Texture Management option to the Graphics Options. That will allow you to set the texture ram to something rather more like your actual video ram (and it also allows you to exercise the texture manager - the 0.5Gb option will force it to load/unload textures pretty much the whole time - use r_stats to track the activity).
The second problem was that the texture manager was using the device driver while the main render thread was trying to render. As the render thread is often the bottleneck, this meant that the texture manager was slowing down rendering while active.
Easy fix, just a matter of adding a flag so the render thread could tell the texture manager to step back and wait until the render was through. Does mean that the texture manager will not load textures as fast, so you will get to see more of those low-res textures sometimes, but ...
The third problem turned out to be a bit more serious. NS2 uses memory-mapped files to load files effectively. Works very well if the file you are looking for is already in the filesystem buffers (then reading a 5Mb files takes pretty much no time at all - just a matter of switching around some internal registers and presto - you have access to the file data).
The texture streamer basically takes the memory mapped file, finds out where the texture data starts and hands that address to the card device driver. Extremely fast and efficient - unless the file isn't in the filesystem buffers.
At which time you are suddenly going to be loading a texture about 10 000 times slower than expected - 100+ ms instead of 0.01ms. And you are locking up the graphics driver while doing so... so those 100+ms gets added to your frame time.
Not a good thing. Fortunately, the fix is easy - just step through the memory mapped area while you are in the background loader to ensure the whole area is loaded (as this is a background thread, it does not affect frametime), and THEN call the device driver.
In case you are wondering why this wasn't caught earlier, you will only notice it if the files you need are not in the filesystem buffers ... and if you are developing NS2 they will pretty much always be loaded.
To be continued...
Comments
This is why the NS2 vets walk / skulk the whole map before the round begins ... gotta load dem textures
Dont worry, not every hitch will be fixed for 267, need something to fix in 268 but that will be easier now that the main reasons have been identified and how to adress them.
Yep and again if you alt tab
I understood about half of that (I think). But doesn't mean I can't awesome you for such a detailed explanation!
Thanks.
One question about the texture streamer... it sounds like the low-res textures are created on-the-fly whenever a texture is loaded. I would imagine that these could be precomputed (at least the vanilla NS2 textures), and just loaded as a single blob on startup. Wouldn't this save CPU time, and ensure that all low-res textures are available right from the start, and with zero hitching?
Nevertheless I understand some bugs are hard to near impossible to catch earlier on, so dont see any of it as me complaining. I am not!
I would say, well done to all involved.
That's actually what the precaching phase is all about. See my next post ... in a day or two.
9 months of living with these issues... I must love this game.
You see, if you believe in matso, he believes in you.
Thanks again for spending the past few weeks hunting down every single one of these
So you even more ways to prove your right next time to uhm... speed it up.
Amazing to see that all forms of hitching are being looked into and from all areas. I greatly look forward to the smoother NS2 that wil be coming about in the near future.
Also a nice public thanks to Ironhorse from bringing this matter up so often.
Which graphics card do you have? I can add that the AMD Radeon HD 6950 does not "lie".
But that was maybe 15% of the issue. The largest impact was the device driver having to tell the OS to access textures from the HD instead of memory.
This is why I've had this problem since reinforced, when texture streaming was forced on for everyone.
The rest of these hitches like file openings are icing on the cake and effects everyone (whereas device locking and other planned solutions mostly effect gpu bottle necked and low end gpu systems) not all of the hitches will be fixed in time for this next patch.. But considering that 90% of them have been addressed, it is definitely something to be excited for!
So if I'm in an area where I have 160 fps, it will feel much more like that how it feels to other engines? That'd be really nice. if I understood all of this correctly
The new option that you are talking about will it automatic detect how much vRAM you have a select the best option for you or is this something you have to find out yourself? I have a 7870 and I think it has 2GB of vRAM, so I'm guessing I would have to switch it to 1.5.
Would you see any different between selecting 1 and 1.5 for this option as well?