Natural Selection 2 News Update - Occlusion culling interview

2»

Comments

  • kingmobkingmob Join Date: 2002-11-01 Member: 3650Members, Constellation
    Awesome video post.

    I love these technical posts.
    We got a good synopsis of the culling system (I believe they investigated 2)
    We also got clarification on the new animation system.

    Max has a great attitude for the need for iteration and good tools.
    Go Max go.
  • AlignAlign Remain Calm Join Date: 2002-11-02 Member: 5216Forum Moderators, Constellation
    <!--quoteo(post=1882155:date=Oct 26 2011, 04:55 PM:name=IeptBarakat)--><div class='quotetop'>QUOTE (IeptBarakat @ Oct 26 2011, 04:55 PM) <a href="index.php?act=findpost&pid=1882155"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->So, will this new system have players load the marine/alien start geometry while in the ready room prior to joining a team? I'm sure that would help reduce the hitching when joining a team.<!--QuoteEnd--></div><!--QuoteEEnd-->
    I'd say that one is relatively harmless, it's a one-time effect and happens at a point where you weren't doing anything anyway.
  • IeptBarakatIeptBarakat The most difficult name to speak ingame. Join Date: 2009-07-10 Member: 68107Members, Constellation, NS2 Playtester, Squad Five Blue, NS2 Map Tester, Reinforced - Diamond, Reinforced - Shadow
    <!--quoteo(post=1882181:date=Oct 26 2011, 02:27 PM:name=Align)--><div class='quotetop'>QUOTE (Align @ Oct 26 2011, 02:27 PM) <a href="index.php?act=findpost&pid=1882181"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->I'd say that one is relatively harmless, it's a one-time effect and happens at a point where you weren't doing anything anyway.<!--QuoteEnd--></div><!--QuoteEEnd-->

    It takes me 5-15 seconds to load the assets when I join a team.
  • JamesK89JamesK89 Join Date: 2011-10-25 Member: 129381Members
    The way the Spark engine is setup sounds a lot like the Unreal engine. Though you could try combining PVS with Deferred Shading so you get the best of both worlds but then you lose the ability to alter the map in real time on account of using a PVS.
  • AlignAlign Remain Calm Join Date: 2002-11-02 Member: 5216Forum Moderators, Constellation
    <!--quoteo(post=1882183:date=Oct 26 2011, 07:34 PM:name=IeptBarakat)--><div class='quotetop'>QUOTE (IeptBarakat @ Oct 26 2011, 07:34 PM) <a href="index.php?act=findpost&pid=1882183"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->It takes me 5-15 seconds to load the assets when I join a team.<!--QuoteEnd--></div><!--QuoteEEnd-->
    I certainly wouldn't mind if it could be cut down, but at least it happens at the "least bad" time...
  • TomDTomD Join Date: 2010-07-31 Member: 73393Members
    <!--quoteo(post=1881892:date=Oct 25 2011, 02:31 PM:name=slime)--><div class='quotetop'>QUOTE (slime @ Oct 25 2011, 02:31 PM) <a href="index.php?act=findpost&pid=1881892"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->It might not exactly "improve CPU usage", but from what I understand with the current system the CPU has to spend ages just waiting for the GPU, doing nothing at all and preventing anything else from happening, so I think it will definitely improve CPU utilization.<!--QuoteEnd--></div><!--QuoteEEnd-->

    Yeah at the moment, CPU is sitting doing nothing while GPU is working stuff out, so although the new method will (possibly) increase the amount of work the CPU has to do, it will be working solidly, so will being doing more work but in less time i.e. improving utilization! I believe the new method lends itself better to parallelization too.
  • JamesK89JamesK89 Join Date: 2011-10-25 Member: 129381Members
    <!--quoteo(post=1882399:date=Oct 27 2011, 06:27 PM:name=TomD)--><div class='quotetop'>QUOTE (TomD @ Oct 27 2011, 06:27 PM) <a href="index.php?act=findpost&pid=1882399"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Yeah at the moment, CPU is sitting doing nothing while GPU is working stuff out, so although the new method will (possibly) increase the amount of work the CPU has to do, it will be working solidly, so will being doing more work but in less time i.e. improving utilization! I believe the new method lends itself better to parallelization too.<!--QuoteEnd--></div><!--QuoteEEnd-->

    This also benefits from less information from having to travel across the GPU-CPU bus. Even though PCI Express is faster then graphics buses have ever been it is still good to avoid transferring information across that bus like the plague,
  • pSyk0mAnpSyk0mAn Nerdish by Nature Germany Join Date: 2003-08-07 Member: 19166Members, NS2 Playtester, Squad Five Silver, NS2 Community Developer
    Very interesting, thanks for the video!
  • MaxMax Technical Director, Unknown Worlds Entertainment Join Date: 2002-03-15 Member: 318Super Administrators, Retired Developer, NS1 Playtester, Forum Moderators, NS2 Developer, Constellation, Subnautica Developer, Pistachionauts, Future Perfect Developer
    <!--quoteo(post=1882399:date=Oct 27 2011, 04:27 PM:name=TomD)--><div class='quotetop'>QUOTE (TomD @ Oct 27 2011, 04:27 PM) <a href="index.php?act=findpost&pid=1882399"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Yeah at the moment, CPU is sitting doing nothing while GPU is working stuff out, so although the new method will (possibly) increase the amount of work the CPU has to do, it will be working solidly, so will being doing more work but in less time i.e. improving utilization! I believe the new method lends itself better to parallelization too.<!--QuoteEnd--></div><!--QuoteEEnd-->
    This is true.
  • stickybootstickyboot Join Date: 2004-01-29 Member: 25711Members, Constellation
    <!--quoteo(post=1881790:date=Oct 24 2011, 06:16 PM:name=MOOtant)--><div class='quotetop'>QUOTE (MOOtant @ Oct 24 2011, 06:16 PM) <a href="index.php?act=findpost&pid=1881790"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->This PDF might explain some of it I think: (I haven't finished reading it yet though)

    <a href="http://cmpmedia.vo.llnwd.net/o1/vault/gdc2011/slides/Daniel_Collin_Programming_Culling_The_Battlefield.pdf.pdf" target="_blank">http://cmpmedia.vo.llnwd.net/o1/vault/gdc2...lefield.pdf.pdf</a><!--QuoteEnd--></div><!--QuoteEEnd-->
    I love these technical posts, and I love them even more when I am linked to even more background on what is going on. Its fun to hear what is going on on a high level but then its also fun to dive in and hear about the technicals. Thanks for the link, and thanks Hue and Max for the interview.
  • Soylent_greenSoylent_green Join Date: 2002-12-20 Member: 11220Members, Reinforced - Shadow
    Would you want a software Z-buffer to be hierarchical or is it simpler and faster just to test at most a few hundred pixels?

    A trick GPUs use is to divide the z-buffer into a hierarchy of tiles; and for each tile store the max z value and min z value occuring in that tile. If the z-range of the entire triangle is entirely in front of an appropriately sized tile you just render the whole thing; if the triangle is entirely behind the tile you just reject the whole thing. If the z-range of the triangle overlaps the z-range of the tile you recursively use smaller tiles. The remaining tiles that cannot be entirely confirmed or rejected are resolved by testing pixel by pixel.

    Instead of storing z-data row by row for the entire screen, you can perhaps improve prefetching and cache locality if you store it tile by-tile as a bunch of small images the size of the biggest tile(say, 32x32).

    When you do the 'software occlusion queries' you can sort the bounding box primitives from left to right and up to down such that you keep hitting the same tiles as many times in a row as possible, so that if you are forced to go pixel by pixel and compare, you keep reusing data that fits comfortably in L1 cache(32 kB data cache on a core 2 architecture CPU and 64 kB L1 cache on an AMD all the way back to the original athlon 64; a 32x32 tile is 4 kB of data if 32-bit float).
  • MOOtantMOOtant Join Date: 2010-06-25 Member: 72158Members
    <!--quoteo(post=1882606:date=Oct 29 2011, 12:53 PM:name=Soylent_green)--><div class='quotetop'>QUOTE (Soylent_green @ Oct 29 2011, 12:53 PM) <a href="index.php?act=findpost&pid=1882606"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->When you do the 'software occlusion queries' you can sort the bounding box primitives from left to right and up to down such that you keep hitting the same tiles as many times in a row as possible, so that if you are forced to go pixel by pixel and compare, you keep reusing data that fits comfortably in L1 cache.<!--QuoteEnd--></div><!--QuoteEEnd-->
    Has that ever worked for you? I tried similar tiling (not hierarchical) for raytracer and haven't noticed anything major. This heuristic belongs to big class of things that sound nice but sometimes only introduce a slowdown. (yes, I know that raytracers are far worse for cache)
  • Soylent_greenSoylent_green Join Date: 2002-12-20 Member: 11220Members, Reinforced - Shadow
    edited October 2011
    <!--quoteo(post=1882667:date=Oct 29 2011, 06:29 PM:name=MOOtant)--><div class='quotetop'>QUOTE (MOOtant @ Oct 29 2011, 06:29 PM) <a href="index.php?act=findpost&pid=1882667"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Has that ever worked for you? I tried similar tiling (not hierarchical) for raytracer and haven't noticed anything major. This heuristic belongs to big class of things that sound nice but sometimes only introduce a slowdown. (yes, I know that raytracers are far worse for cache)<!--QuoteEnd--></div><!--QuoteEEnd-->

    I've never dealt with occlusion culling so I don't know if the underlying data is suitable. I've seen at best a factor of a few improvement and at worst a degradation of performance by a factor of a few from loop tiling and strip mining type techniques on other problems.

    If the data is unsuitable, if tiles are rarely used repeatedly or it is very costly to make things fit into tiles and trying to hit the same ones repeatedly, then you just end up with a pure loss.

    Of course you try the straight forward approach first; and if it's blazing fast don't bother. But if you need the performance and you try tiling and loosely sorting and if it is a successful optimization you can try something even more complex. You can make a first pass where you try and reject or affirm as many 'software occlusion queries' as you can using only the z-max and z-min of each tile and simply record the test object(I'd presume OBB's, AABB's or some combination like OBB's for regions of the map and AABB's aligned to the viewer for making occlusion testing of their contents faster) and what tiles it needs further evaluation against. Now you don't even do detailed evaluation of one occlusion query at a time, but you split them into piecewise evalution against each tile; the complexity(and large overhead!) is in the data structure needed to attach each 'job' to a specific tile and maintain links between the same occlusion test split over a number of tiles so that if one tile is affirmed to be visible you don't need to evaluate all the other tiles this query may be attached to.
  • HughHugh Cameraman San Francisco, CA Join Date: 2010-04-18 Member: 71444NS2 Developer, NS2 Playtester, Reinforced - Silver, Reinforced - Onos, WC 2013 - Shadow, Subnautica Developer, Pistachionauts
    Just reading all these technical posts makes me feel like my brain is getting bigger. Love it!
  • aliasWarlordaliasWarlord Join Date: 2010-06-09 Member: 71999Members
  • SpaZSpaZ Join Date: 2003-06-11 Member: 17256Members
    <!--quoteo(post=1882803:date=Oct 31 2011, 04:21 AM:name=NS2HD)--><div class='quotetop'>QUOTE (NS2HD @ Oct 31 2011, 04:21 AM) <a href="index.php?act=findpost&pid=1882803"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Just reading all these technical posts makes me feel like my brain is getting bigger. Love it!<!--QuoteEnd--></div><!--QuoteEEnd-->

    I feel the same way!

    While reading these posts I'm like "Yes, yes that sounds logical my fellow scientist, lets go that route with this triangle" in my head. :D

    Sorry I'm a bit tired.
Sign In or Register to comment.