Balance Analysis (with statistics)

ScardyBobScardyBob ScardyBob Join Date: 2009-11-25 Member: 69528Forum Admins, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow
Since more people (including UWE) have been talking about how the win stats relate to balance I figured I'd give it a try. The first problem is what does 'balance' mean with regards to win stats? In my view, a 'balanced' NS2 would be one in which each team wins 50% of the time after controlling for the following factors:
- Map
- Playercount
- Match length
- Skill of team
To give an example, aliens should win 50% of the time on a 8v8 summit match lasting 1 hour with teams of equal skill. We could see if the current build achieves this 'balance' by only looking at win stats data for summit with an average playercount of 16 lasting between 55 and 65 min and using some type of team skill measure (I don't know of any existing or planned method to measure this so we might just have to accept some bias due to team skill). If the win stats was 'close' to 50% we could say that NS2 is balanced (at least for the example specified above). We could then check other scenarios (4v4, rockdown, 20 min, equal skill; 6v6, tram, 40 min, equal skill; etc.) to also test their 'balance', eventually leading to enough 'close' to 50% scenarios that we could call NS2 'balanced' (as a practical matter, UWE might not want to balance every potential scenario and instead just accept that some map/playercount/length/skill combos are going to be unbalanced; <4 playercount or <5min matches are good examples).

So how do we determine if the win stats are 'close' enough to 50%? This is where the statistics come into play. Since win stat outcomes in NS2 are either win or lose, they can be described by a binomial distribution (each match outcome has to be independent which is why we want to control by the factors above). Here, 'closeness' is dependent on the number of matches played, such that the range in which one side wins (e.g. aliens) decreases with increasing matches played as shown below (with a 95% confidence interval). For example, if, after controlling for the factors above, aliens won between 46.9% and 53.1% of 1000 matches, NS2 would be 'balanced' by the definition above.

Matches - [Range of alien wins, 95% CI]
10 - [20%-80%]
100 - [40%-60%]
500 - [45.6%-54.4%]
1000 - [46.9%-53.1%]
10,000 - [49.02%-50.98%]

This is more or less impossible to due with the current win stats available (afaik only wins, map, and length are available). However, I can show you how this analysis would look with the (probably poor) assumptions that map, playercount, match length, and team skill are evenly distributed for each side (therefore, not effecting the 50% win 'balance'). Basically, this first chart is saying what everyone who has looked at the overall win stats knows, NS2 has favored aliens for all builds since B162, spiked at B178, and has returned to slightly favoring aliens in B180, with a 95% CI. The green range indicates how 'close' the blue line would need to be to 50% to be considered statistically 'balanced'.

Alien wins for all maps, by build:
<img src="http://i.imgur.com/0rPbl.png" border="0" class="linked-image" />

Although I can't control for all the factors above, I do have enough data to control for the map. As the second chart below shows, it follows a similar trend as the alien wins for all maps (with B178-B180 in particular). In fact, some builds could be considered statistically 'balanced' for some of the maps (B166/168/180 for tram and B174/180 for rockdown), although this is partially due to the low match count for each map per build (all of them are <300 matches and some are as low as 11). However, there are some statistically significant differences in alien wins per map that indicate the need to control for the above factors.

Alien wins by map, by build:
<img src="http://i.imgur.com/YgdZS.png" border="0" class="linked-image" />

So what can we conclude about NS2 balance from this? Sadly, not much. At best, I'd say that current NS2 balance probably, but not conclusively, favors aliens. To be more certain, a more thorough analysis is needed that would include time-weighted average of playercount, match length, and a measure of team skill (something like a time-weighted average of playerscore per team might work). More specifically, it would be useful if the JSON stats recorded the following per min (or every 5 min if 1 min is too often)
- Unique match ID (to help link the data point to the match)
- Current match length
- Playercount per side
- All individual playerscores per side
in addition to the map and winning team on completion.

Comments

  • HakujinHakujin Join Date: 2003-05-09 Member: 16157Members, Constellation
    edited July 2011
    I've always felt that new players tend to choose the marine side leaving more experienced players on the alien side. This potential confounding factor could render this kind of aggregate analysis moot if not controlled for.
  • WilsonWilson Join Date: 2010-07-26 Member: 72867Members
    When people complain about balance I don't think it's as much who wins and loses the entire game. I think it's more to do with individual balance between the players.

    You could make aliens all die in 1-shot but enable them to kill the IPs and CC very easily and it might turn out with 50% wins for both sides. While you could say that the game is balanced. The individual fights between players wouldn't be. It's not very fun to feel like you can't do anything because the other player is too powerful. It doesn't matter if you win the game.

    Both players should feel like they could have killed their opponent if they had done something different or played better somehow. You should never feel that the game limited you and there was nothing you could do.
  • NolSinklerNolSinkler On the Clorf Join Date: 2004-02-15 Member: 26560Members, Constellation
    You worked hard on this. Good job. :)
  • ScardyBobScardyBob ScardyBob Join Date: 2009-11-25 Member: 69528Forum Admins, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow
    <!--quoteo(post=1860531:date=Jul 13 2011, 08:03 PM:name=Hakujin)--><div class='quotetop'>QUOTE (Hakujin @ Jul 13 2011, 08:03 PM) <a href="index.php?act=findpost&pid=1860531"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->I've always felt that new players tend to choose the marine side leaving more experienced players on the alien side. This potential confounding factor could render this kind of aggregate analysis moot if not controlled for.<!--QuoteEnd--></div><!--QuoteEEnd-->

    That's why I think you need a measure of team skill, which would more or less cover this issue.

    <!--quoteo(post=1860532:date=Jul 13 2011, 08:11 PM:name=Wilson)--><div class='quotetop'>QUOTE (Wilson @ Jul 13 2011, 08:11 PM) <a href="index.php?act=findpost&pid=1860532"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->When people complain about balance I don't think it's as much who wins and loses the entire game. I think it's more to do with individual balance between the players.

    You could make aliens all die in 1-shot but enable them to kill the IPs and CC very easily and it might turn out with 50% wins for both sides. While you could say that the game is balanced. The individual fights between players wouldn't be. It's not very fun to feel like you can't do anything because the other player is too powerful. It doesn't matter if you win the game.

    Both players should feel like they could have killed their opponent if they had done something different or played better somehow. You should never feel that the game limited you and there was nothing you could do.<!--QuoteEnd--></div><!--QuoteEEnd-->

    Good point. I don't think that is happening too much with NS2 right now.

    I suppose you could do something similar with alien vs marine interactions (taking into consideration tech, number, map, skill, etc.) and come to some consideration of balance (e.g. 1 lmg marine vs 1 skulk, marine should win 50% of the time, 2 lmg marine vs 1 skulk, marines should win 75% of the time, etc). Though, it'd be more difficult and I don't think the correct stats are being tracked right now.
  • SiniStarRSiniStarR Join Date: 2010-04-13 Member: 71380Members
    Great, thorough analysis. Though I wouldn't be too worried about it until exo suit, onos, etc. come into play. Extreme balancing doesnt seem to be a step in the right direction at this stage. Even gameplay elements are changing all the time. I appreciate UWE's dedication to bring some order to the chaos that is game balancing but I am not gonna sweat over it. If I die a thousand times as marine, ill just work my way to figure out how to win, at least until the next build comes by.
  • hf_hf_ Join Date: 2011-06-10 Member: 103639Members
    Great work (as others have mentioned)!

    I think that the recent trend towards less alien wins, more marine wins is an encouraging sign that UWE are having some success with tinkering features that have already been implemented.

    However, my main concern is that marines have most of their tech already in the game -- aliens are missing two upgrade chambers and some other skills / abilities that have yet to be implemented. Thus, if you leave the marine team with the same tech, but add the new features to the alien team that will be eventually be included when the game goes gold, I think you'll again see a much higher win percentage for aliens.

    While I agree with UWE that it's important to add some balancing to each build to make the beta competitive / fun, I'd like to see a stronger drive to include more alien abilities. This is because I think the marine tech tree is better developed than the alien side, and it doesn't make sense to me to keep adjusting and playing with minor mechanics since they will undoubtedly be changed again in the future when more content is implemented.

    In summary, I think time is being wasted balancing an unfinished product, and that the focus should be on more new content even if it creates further balance issues because those issues will inevitably be changed in the future. It's cool to look at statistics now, and I really like the graphs, but they're relatively meaningless because of the lack of exo suit, onos, jetpack, shade, shift, and other alien abilities.
  • WilsonWilson Join Date: 2010-07-26 Member: 72867Members
    <!--quoteo(post=1860540:date=Jul 14 2011, 05:28 AM:name=hf_)--><div class='quotetop'>QUOTE (hf_ @ Jul 14 2011, 05:28 AM) <a href="index.php?act=findpost&pid=1860540"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->In summary, I think time is being wasted balancing an unfinished product, and that the focus should be on more new content even if it creates further balance issues because those issues will inevitably be changed in the future. It's cool to look at statistics now, and I really like the graphs, but they're relatively meaningless because of the lack of exo suit, onos, jetpack, shade, shift, and other alien abilities.<!--QuoteEnd--></div><!--QuoteEEnd-->

    I don't think times being wasted on it. It's not like they delayed working on everything else to make balance changes. The changes they make are part of the development process not something extra.

    I keep seeing people talk about the onos, exo suit etc. IMO I don't think we will be seeing those things for several months at least. I get the impression that they want to try and get the early and mid game stuff all in and polished a bit before adding in any of that late game stuff. Even if they added the onos in tomorrow it wouldn't make any difference to the early game anyway. They would still have to go back and look at skulk vs marine play and sentry guns and all that stuff. The way I see it, they may as well do it now and get a game with a good early and mid game. Then at least we can have fun playing it. Adding in late game units will be much easier when the rest of the game is playing well.


    As far as balancing the game is concerned, I don't think they're trying to get it perfectly balanced right now. They are just making changes to try and stop dominant tactics with no counters like spamming sentries. It would just become frustrating for players if they left things like that in for months. Also, the last few patches have changed the aliens in some big ways. Everything is subject to change at this point and there's still lots of things to be added. There will be many things that unbalance the game that are fixed in later patches. That's just the way development goes.
  • HughHugh Cameraman San Francisco, CA Join Date: 2010-04-18 Member: 71444NS2 Developer, NS2 Playtester, Reinforced - Silver, Reinforced - Onos, WC 2013 - Shadow, Subnautica Developer, Pistachionauts
    Great effort ScardyBob. It's great to see some hardcore objective analysis.
  • RuntehRunteh Join Date: 2010-06-26 Member: 72163Members, Reinforced - Shadow
    I was thinking about this the other day. The win stats provided by UWE don't account for even matches.

    For all we know 30% of wins could be from skull rushes before a server becomes populated enough to have a decent game.

    There are possibly a lot of discrepancies that make up those stats.
  • HarimauHarimau Join Date: 2007-12-24 Member: 63250Members
    Those are beautiful graphs.
  • Shrike3OShrike3O Join Date: 2002-11-03 Member: 6678Members, Constellation
    edited July 2011
    I'd like to see the stats for games that last longer than five minutes :) If they significantly differ from the short game numbers, we'll know one side has a distinct rushing advantage.

    It'd also be nice to see some larger servers, and see if there's a noticeable shift in which team wins when there's more players. There's a *ton* of 4v4, 5v5 servers close to me in Seattle, but if I want to get on a 9v9, I'm usually connecting to Australia :P
  • ScardyBobScardyBob ScardyBob Join Date: 2009-11-25 Member: 69528Forum Admins, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow
    edited July 2011
    Thanks for all the positive feedback!

    I think I might want to take a closer look at the more recent stat tracking data (that I think came out with B178 or B180) to try to separate out the factors better. Anyone know an easy way to get the JSON data into a spreadsheet?
  • wulf 21wulf 21 Join Date: 2011-05-03 Member: 96875Members
    <!--quoteo(post=1860540:date=Jul 14 2011, 06:28 AM:name=hf_)--><div class='quotetop'>QUOTE (hf_ @ Jul 14 2011, 06:28 AM) <a href="index.php?act=findpost&pid=1860540"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->In summary, I think time is being wasted balancing an unfinished product...<!--QuoteEnd--></div><!--QuoteEEnd-->

    No, it isn't. All the numbers are now even in the same file called "Balance.lua". Changing them would barely cost 5 minutes.

    As Scardybob has shown, the real effort in balancing would be analyzing the win/loss data to see the effect the changes had. I guess UWE is currently not doing this at all. They just listen to what the community tells them and maybe decide in their internal discussions what they want to change.

    So all the developement time that has been "wasted" on balancing are 5 minutes in a week.
  • twilitebluetwiliteblue bug stalker Join Date: 2003-02-04 Member: 13116Members, NS2 Playtester, Squad Five Blue
    edited July 2011
    <!--quoteo(post=1860802:date=Jul 15 2011, 03:03 AM:name=wulf 21)--><div class='quotetop'>QUOTE (wulf 21 @ Jul 15 2011, 03:03 AM) <a href="index.php?act=findpost&pid=1860802"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->No, it isn't. All the numbers are now even in the same file called "Balance.lua". Changing them would barely cost 5 minutes.<!--QuoteEnd--></div><!--QuoteEEnd-->
    I just like to point out that some data that should be in Balance.lua are scattered: eg. Hydra.kDamage in Hydra.lua, and Whip.kDamage in Whip.lua.

    I agree that the JSON data have too many variables, and cannot be used to accurately gauge game balance. Game balance needs to be tested in more controlled environments with controlled variables (ie both teams with equal number of payers, similar skills, similar latency, etc).
Sign In or Register to comment.