Skill System is Good

nemo · March 2015

The skill system feels really good right now to the point were the only thing to optimise for now is edge cases, and they seem to be in different parts of the algorithm.

1. Measure of current team strength at the current moment of time.

This looks like its currently the average of all players on a side. It has issues with a couple of scenarios

It falls down when the skill level of the commander is vastly different to the average skill of the field players.

- Rookie jumps in the comm chair, the team is far less likely to win than the skill system thinks.
- The best player is in the comm chair, the team is far less likely to win if the force even put a pro player (commander) + rookies, against a team of average level vets. The expected carrying ability of the pro player is not there.

It also suffers if you have players AFK right at the start, this is easy to detect and the skill system shouldn't consider them present in the team at that moment. This can be safely ignored if you consider that this should on average impact everyone equally. The only problem is it can result in these massive 40 point shifts easily.

As others have also mentioned the standard deviation of the teams skill level seems to be important but i'm not smart enough to know what impact its having without some data.

2. Calculation of hive score metric.

As most people have pointed out this could really do with being at least 2 scores, Alien skill and Marine skill. The two are essentially combined right now resulting in a figure you can't use to predict a game outcome. I for example have a hive skill of about 1700, the reality is i'm about a 1900 level alien, and about a 1500 level marine. As such my skill level bounces between 1600 and 1800, never accurately portraying my ability to help a team win.

Other anomalies

ShellMo asked me to post about her hive profile score. http://hive.naturalselection2.com/profile/83013717 we mostly play on a single server (Hellarious Basterds) her skill level right now is at 407. We cannot work out why it is so low, she is most often in the top 2 players in a team (when the teams average tends to be about 1400). Its obviously based on winning rounds, but the feedback seems to be really weak. It could even be to do what I call skill level inbreeding, the same regular players play on the same server so we almost have our own skill level scores which don't seem to be comparable to the randoms that sometimes come in. Hence perhaps a 1000 skill level player on wooza might not be comparable to a 1000 skill player on HB...

UncleCrunch · March 2015

Therius wrote: »

Statistics are a thousand times more accurate than personal anecdotes. You're pitting tens of thousands of games worth of data against a one-time personal story of one individual player. You are arguing against mathematics. And when you argue against mathematics, you lose.

I was polite not to say it happens all the time for me. If the math is wrong, it just give wrong answers given the need for balance. One has to prove to me it works every time. I'm afraid it's not. You gonna have a hard time, i guess, but you can try. Facts are here. It stacks up rookies when somebody is clearly above the rest.

There is no point in balancing games with player having the same score is it ?

Many undeniable points have been exposed by others and every time we get back to the "well, you know, it's an indicator", with a big math explanation sticked upon... It doesn't take a genius to say "it's not ready", staying polite.

Like @Carnage pointed out, you can't carry a game on your own when this happen. I may add, you can't take it seriously anymore after 2 or 3 times. And when it's every time... guess what ?

Therius wrote: »

You are missing the entire point. Every single thing you mentioned is already captured by the player's win-lose ratio, those and every single other determinant of game victory. And it's unbiased, because every determinant affects the win-to-lose ratio by the amount it actually affects the underlying probability of winning and not by the amount some 'expert' says it does (if we went by your route, SOMEONE would have to decide WHICH factors are taken into account and HOW MUCH weight is given to those factors, and there is so much room for error there). By changing the algorithm to forget about the player's victories and concentrate on the measurements you suggest, you're not taking more things into account, you're making the model take less things into account. See numerous posts above (mine included) and the original proposition by moultano in the original thread.

The problem does not lie in the statistic used to calculate a player's skill rating. The problem lies in how to use these individual ratings to a) create equal teams and b) calculate a team rating actually representative of the combined competency of individual players. The problem, again, is NOT in the way the individual player's skill is calculated. Under the assumption that the only thing to strive for in a game round is to win, which is an assumption I don't think anyone disagrees with, the only sensible statistic to use in the measurement of player skill is the win-lose ratio.

These problems are difficult, if not impossible, to address with only discussion. I proposed earlier that the standard deviation of individual skill ratings within a team might have an effect on the outcome that's not captured by the team's average skill rating. I'm not suggesting any actions, however, since all I have is anecdotes, opinions and educated guesses.

What we need is DATA. Data on whether, given equal average skills, the team with LESS standard deviation in skill ratings consistently has a larger probability of winning that match or not. If we find that this is the case, standard deviations should probably be taken into account when creating balanced teams. How? Hell, if I know.

I hope that data is available. The whole skill system relies on data, so I hope this was thought out.

You should argue that with competitive players. Many times you loose for a inch, a missed bullet or a second. They fight for millimeters !

Some (if not all) of them are truly good and execute operations and synchronization to a level that even if you loose you should be rewarded non the less. But there it is; for a inch, a missed bullet or a second (& millimeters + that stupid lag... right guys), they get a 0 or a 1. And it goes directly in hive stats and scores. And most of them finally go away. Who can blame them ? When you get a score that falls under some rookies, i can understand the frustration.

If FET was sooooo good; explain to everybody why a truly good competitive player have a Hive score under some that don't do competitive stuff while they can be of equal force or even better; the comp is better ? You can't argue that it is taken under consideration during FET. The score is the base number as you said. Win / loose ratio...

Ho! And the ShellMo case... Seen this player many times. Clearly NOT a ~400. At least 1600 minimum.

krOoze · March 2015

@UncleCrunch You kinda did it again. You said FET is completely broken, and then listed some minor mathematical things that are bad in it. So I assume you are content with FET, if the math at work gets better. Then you used the word "tools" as a silver bullet again. WHAT TOOLS SPECIFICALY??? I assume you don't mean FET, or some improvement or extension of it.

But let me present my proposition instead then: I think the concept of FET is perfect (I said concept, not necessarily how it is actually implemented). I think FET should be automatic, and the manual teams picking should be voted on (it is opposite now; manual picking is default, FET is voted on). No F4, no whinning, disconnect before game start triggers rebalance. People who go peeing go Spectator or get banned for a week. People joining and disconnecting during game get assigned automagically, perhaps if need be rebalancing using people who joined team recently(and are not yet too loyal to their team) or people that are doing badly in their current team(and possibly not enjoying themselves) or the system can ask for volunteers or not mess with it at all(I think disbalance because of joins and disconnects won't be that much problem in most cases).
Then similar FET system should be done for commanders. I think something like: People who don't mind doing comm will have that set in settings. Then the system will pick some interesting comm pairs and people will vote which pair they want (that is what kind of game they want), like its done for maps. ASSUMING the FET math works well, this should get us enjoyable game most of the time, and cut the riddiculously long pregame time to tenth of the duration.

Now about the (imperfect) FET implementation: I think I know exactly what went wrong. The math was designed by a programmer. You actualy need a matematician for that. No, programmer can never be a matematician. Programmers are trained to make things simple or fast. Mathematicians are trained to do things correct and abstract. Ps do know math but tend to do it simple or fast. Ms know programming, but for Ms all math is equal in simplicity and speed, so they should not in turn do pure programming jobs. Sorry for the psychological evaluation, could not help myself. Now be warned below, that I come from the programmer training and can make the same kind of mistake.

Sorry for the detour, now really what is wrong with the FET math (not necessarily in this order):
1) Skill number does not perfectly match the actual skill of a player.
This can't be helped. But it is I think adequate now and can be improved over time to acknowledge more kinds of good player behavior. I mean it is not neccessary to be extremly precise. Sometimes you are just tired, sometimes you make bad tactical decisions regardles of your skill. The system just needs some rough estimate of how you are doing in the long term.

2) Skill number is not linear (one 2000 SP player does not equal two 1000 SP players and those do not equal infinity rookie 0 SP players). It is bad, because for a nonlinear measure the average(Arithmetic mean) does not represent anything useful. It is meaningless number. So some effort should be made to linearize Skill Points.

3) The FET splits the team acording to average(Arithmetic mean). That is bad primarily for reason 2) and because it does not care about actual team demografic. As @UncleCrunch said you get one 3000 guy and a bunch of rookies and the average stays the same.
Maybe some other measure should be used. I am not sure which are the perfect one. Now the need comes for a learned matematician. Perhaps a sum of Skill points (you get two 1500 guys for one 3000 guy in the other team). Or the system must take care the Skill point demographics of both teams are as close as possible. Simple fix(you see, already doing it sloppily as a programmer) would be to give the best player to one team, second and third best to second team, then forth and fifth to the first again and so on. In such system you don't care much about the point 2).

4) The FET is deterministic. When FET gets the balance wrong, it does exactly the same teams over and over and over, untill someone drops or joins the game. Even if FET gets it right it is boring to play the same team every game.
The algorithm simply needs to be randomized. Which can be hard to do correctly. Again you need matematician.

5) Now only minor things: The FET does not care about inherent map inbalance. Also it does not care you are handicapped by your ping on that particular server.
The skill point balancing could be skewed to give more skilled players to particullar side, based on a hive win-rate of the sides on particular map. If a player has bad ping, it could count him as less skilled (or more, but personaly I play worse with high ping). The system would maybe need to track if someone is not cheating with their ping, but I think such cheat is not worth it for this.

Sorry if I have stolen some idea from someone. What do you think about those propositions? Can it be done? Would that be good?

Therius · March 2015

UncleCrunch wrote: »

If the math is wrong, it just give wrong answers given the need for balance. One has to prove to me it works every time. I'm afraid it's not. You gonna have a hard time, i guess, but you can try.

I'm afraid it does not work that way. No one needs to prove you the math works, you have to prove that the math does not work. Which, so far, you haven't done, since all you do is state personal experiences instead of pointing out anything specific within the system. And it doesn't matter even if you were "polite enough to not say that these things happen all the time to you", it's still only you, arguing personal experiences against data from thousands of players with hundreds of rounds played each. You're arguing against maths that, I assume, you do not fully understand. Feel free to prove me wrong.

UncleCrunch wrote: »

If FET was sooooo good; explain to everybody why a truly good competitive player have a Hive score under some that don't do competitive stuff while they can be of equal force or even better; the comp is better ? You can't argue that it is taken under consideration during FET. The score is the base number as you said. Win / loose ratio...

Ho! And the ShellMo case... Seen this player many times. Clearly NOT a ~400. At least 1600 minimum.

There are some things that might skew the number one way or the other. The number is only ever valid as a reference to others, so if, for example, two players play in completely different populations (on different servers on opposite sides of the world), their skill ratings might not be comparable. With that said, the ShellMo case: again, you're arguing that your opinion is more valid than the myriad of data on the subject. You're saying that the skill rating of this player is wrong because IN YOUR OPINION he is much more skilled and the skill system must be wrong because it DISAGREES WITH YOU.

I'm sorry to say this, but you're to this discussion what people saying "but it was pretty warm and sunny today so it doesn't exist" are to global warming.

meatmachine · March 2015

@UncleCrunch errors you're pointing out have been explained at length in previous posts within this thread. They're known issues, and currently unavoidable given the general state of the game's population. Do some reading and if you can propose a solution to any given problem people will be happy to hear you out.
FYI the problems, and the solutions to them, are more logisitical than mathematical.
I have a solution I'd like to propose for the "skill rating population pocket problem" (whatever you want to call it). The solution is FAR from what I would consider ideal, but it's something. Its complicated to write out and I'm currently at work SSSHHH so I'll have to type it up later/ on my lunch

Wob · March 2015

krOoze wrote: »

2) Skill number is not linear (one 2000 SP player does not equal two 1000 SP players and those do not equal infinity rookie 0 SP players). It is bad, because for a nonlinear measure the average(Arithmetic mean) does not represent anything useful. It is meaningless number. So some effort should be made to linearize Skill Points.

One 2000 SP does not equal 2 1000SP players. One 2000 SP and one 0 SP player = 2x 1000 SP according to FET.

I don't see the problem in that tbh. The 2000 SP probably won't win, so doesn't deserve that skill level because he obviously can't compete at that level. When you look at it in terms of a total of 4 players you might not be able to see that clearly but in an 8v8 situation, the 3000 SP player should be able to balance it out with his team.

I get put with absolute rookies these days and when I lose I think to myself "I'm not good enough to deserve this skill level clearly".

krOoze · March 2015

nachos wrote: »

One 2000 SP does not equal 2 1000SP players.

Exactly. That's the problem. SP measure is not linear( 2000 SP player is not two times better than 1000 SP player. 1000 SP player is not infinitely better than 0 SP player. If SP were linear and those players did have those SPs, they would have to be.). If you do avg of a non-linear mesure, you get unbalanced teams.
Let me explain on school grading. Imagine you would get A(or 1 somewhere in world) for being good, C(3) for being OK, G(7) for being average, O(15) for being below average and Z(26) for meeting minimum requirements. Now the teacher at the end of a term does an average of your grades. In this case if you do average(=arithmetic mean), you get pretty much something around the worst grade you got during the semester. You would not be satisfied with that. It happens because this grading is not linear.
Also look at this graph:

You see the average(=mean) works for one distribution of players and is really bad on other. On that graph the team with lots of less skilled players actually have significantly higher average(=mean) skill.

UncleCrunch · March 2015

Therius wrote: »

I'm afraid it does not work that way. No one needs to prove you the math works, you have to prove that the math does not work. Which, so far, you haven't done, since all you do is state personal experiences instead of pointing out anything specific within the system. And it doesn't matter even if you were "polite enough to not say that these things happen all the time to you", it's still only you, arguing personal experiences against data from thousands of players with hundreds of rounds played each. You're arguing against maths that, I assume, you do not fully understand. Feel free to prove me wrong.

Mathematician as any technical/scientific and serious field have to refute the theory in order to see what's left. Then what's left is probably the closest to the reality. We should start that way, instead. Right ?

There is one FET algorithm in NS2 not 10, not 5; only 1. If there's a problem... it's not with a part of it. It's the algorithm implementation and by extension the math behind it. Otherwise if the theory is perfect, the programmers did a terrible job and it shouldn't be enabled in the game in the first place. I trust they did it under supervision (see corresponding threads) and that work has been through a cycle of validation.

So to speak. If any above average player get stacked with rookies not 1 time but every time, it means that something in the datas was not correctly evaluated and / or not taken care of. Result is the the same. It's broken. Period. The only way to progress isn't in denial of this.

An no; it's not only me as @meatmachine pointed out. It has been explained many times (and / or with different words i may add).

So again : what's the point of using something in an environment it cannot be used as "a little something" (that ruins the fun of everybody) goes wrong ???

Therius wrote: »

There are some things that might skew the number one way or the other. The number is only ever valid as a reference to others, so if, for example, two players play in completely different populations (on different servers on opposite sides of the world), their skill ratings might not be comparable. With that said, the ShellMo case: again, you're arguing that your opinion is more valid than the myriad of data on the subject. You're saying that the skill rating of this player is wrong because IN YOUR OPINION he is much more skilled and the skill system must be wrong because it DISAGREES WITH YOU.

I'm sorry to say this, but you're to this discussion what people saying "but it was pretty warm and sunny today so it doesn't exist" are to global warming.

Ahhh the "some things that might skew", come on be honest.

If i saw ShellMo on several crowded servers which aren't that many lately (and this is a fact), it means that we play on the same areas. Let me state my opinion by a question as you don't like opinions : How come a player that is clearly above others considering facts like scoreboard, won games and such, has a progression that is so slow (even for 500hrs = not as many games as a 1500) ?

What i see right now as a fact: More people vote against FET. Because even the rookies see this. Being stacked together doesn't work and worse if the +2K skill goes commander (or gorge). Oh! Right, i almost forgot, it's usually the only ONE who can do commander stuff. It's just ridiculous as it looks like a knife in the butter for the other team.

As @nachos pointed out and as i do too, it's not because you are +2000 that instantly one clone appears besides you when you meet a 1000 on the map. There are things like speed, bullet damage, upgrades, configuration (location) that will set a limit you absolutely can't break. BTW @nachos you should think like that, no one expect to win a game with 4 rookies on his side and none on the other; while FET says otherwise like "well... it's ok". It's not you.

But beside the topic, which shouldn't be on the top (for answering to @krOoze).

We still don't have anything to kill the gap between 0 (or 1000) and 2000 skill players. If the FET algorithm cannot cope with that gap let's reduce it (not artificially). The only thing a rookie can do right now to try to get better is play and play and play. Let me rephrase that. Get rolled over, punished, spanked and obliterated (did i say stomped?). We know the rest of the story. They quit not long after that "purge".

Imagine you have a NS2-class system directly in game. Would it be ok, wonderful, crap ? I mean if any usual rookie don't read, go to Youtube and try to learn before launching the game, how would you adapt the pedagogy to get them IN NS2 ? They are volunteers to teach, and some to learn. Why should they be limited by the game. Ok there is website solutions (ENSL and others), and steam friends but it's far from being ideal.

It's better when it's in-game. We're here on the server; we train the rookies and that's it. Fast, one step procedure, reliable and efficient. Implementation of a status (willing to teach) and a vote to set the server in sandbox mode. 1 hour with a teacher is to my opinion worth 10 hours playing and wandering without getting a single clue in the end.

That is a tool ! And i suggest you read Ideas & Suggestion if you're not already doing it. Many ideas in there.

Instead as new implementation we had "matchmaking system" which doesn't look to be a success but that's my opinion. And FET that stack rookies with 1 +2000 and finally ruins the fun for everybody (even the winners). Oh and badges too... They're nice.

Killing that gap: no FET will ever solve this. In fact it cannot as the very logic behind is to take the gap into account. In fact it can be considered more as part of the problem than the solution, I'm afraid. It may evolve for the better, but it won't solve anything of the sort.

And i encourage skilled players to use their talents to try to teach to others. It's long and hard, we're not really helped, but it's better than letting this community being a wasteland full of elites that will in the end get bored with this and leave as well. NS2 has so much to give to young minds when they get in. It would be a shame to leave them missing the point. Even if it may look "too late".

krOoze · March 2015

@UncleCrunch I was addressing team balancing issue, not a rookie player retention directly. Was that answer really to me? As to the "tool", I wanted your best shot, not really looking for your bibliography or the whole of a I&S forum.
How does voluntary (self-)education and class system solve the reason you always F4(the post of yours that I responded to originaly). Skilled players stack one team, unskilled join the empty team = unbalanced. Either crappy game happens, or everybody just waits for half an hour for FET vote to pass. Nothing changes.

How it works now: On server browser you choose how much you want to get stomped(server skill level). Unfortunately there are no low level skill servers available most of the time (apart from siege and combat modes). The FET ensures balanced=enjoyable game (that is IF players vote for it, and if the math was better). The concept is good, no? Just in need of polishing(fixing horrible FET mathematical solution and making FET automaticaly without vote).

Therius · March 2015

UncleCrunch wrote: »

What i see right now as a fact:

This condenses the problem with all of your arguments. None of them are fact. All of them are either opinions of observations from your personal, very small, probably psychologically biased sample.

UncleCrunch wrote: »

So to speak. If any above average player get stacked with rookies not 1 time but every time, it means that something in the datas was not correctly evaluated and / or not taken care of. Result is the the same. It's broken. Period. The only way to progress isn't in denial of this.

See? You're actually stating that something that YOU perceive as happening "every time" must be universal fact and that we mustn't deny this. Well, here's a rebuttal: it NEVER happens to ME. Oh, won't you look at that. Now we have two 'facts'. It's your word against mine. Who wins?

The data wins. Our own perceptions do not matter if they disagree with the data. The data does not lie.

UncleCrunch wrote: »

If i saw ShellMo on several crowded servers which aren't that many lately (and this is a fact), it means that we play on the same areas. Let me state my opinion by a question as you don't like opinions : How come a player that is clearly above others considering facts like scoreboard, won games and such, has a progression that is so slow (even for 500hrs = not as many games as a 1500) ?

If you understood how the skill system works, you would be able to answer this very quickly. I'll help you. If a player has high win-lose ratio but low skill, that means that his victories are against weak teams, not earning him points, and his losses, too, are against those weaker teams, which will lose him a lot of points. The system works.

Besides, I checked this person's hive statistics. He has both a win-lose and kill-death ratio very close to one from almost 2000 observations (rounds). Where does your argument of a player "clearly above others considering facts like scoreboard, won games and such" come from? I think I know. You played a few rounds with him in which he got lucky. Again, you state as FACT (see quote) something that you PERSONALLY saw in a few rounds, when even the most basic and most easily searchable data statistics disagree with you. See the problem in relying on a few observations?

I hope you finally see why my analogy of global warming was spot on.

nemo · March 2015

I've listened to all arguments thus far, and the logical side of me knows what you are saying, but really I hope someone can just suspend their belief in the skill system for a moment just to be open to the idea that there might be something not quite right.

ShellMo is considering leaving the game because its totally demoralising to be playing in a server with an average skill level of about 1500, when she feels like she is contributing just as much as everyone else, everyone on the server believes she is, the skill board suggests she is as well, yet the skill system is saying she is barely above a complete novice.

Here is an example of a practise match we played last week

hitbox.tv/wiwter

Here is her profile again, http://hive.naturalselection2.com/profile/83013717 down to 304 now. This cannot be correct, I am in no way 5 times better at the game... The skill system at the moment literally thinks it would take her combined with another skill level 3100 player to balance out against 2 players of my skill level..... I play sat right next to her, I watch her play the game while i'm waiting for a slot on the server. Something is definitely not right.

Nordic · March 2015

nemo wrote: »

I've listened to all arguments thus far, and the logical side of me knows what you are saying, but really I hope someone can just suspend their belief in the skill system for a moment just to be open to the idea that there might be something not quite right.

ShellMo is considering leaving the game because its totally demoralising to be playing in a server with an average skill level of about 1500, when she feels like she is contributing just as much as everyone else, everyone on the server believes she is, the skill board suggests she is as well, yet the skill system is saying she is barely above a complete novice.

Here is an example of a practise match we played last week

hitbox.tv/wiwter

Here is her profile again, http://hive.naturalselection2.com/profile/83013717 down to 304 now. This cannot be correct, I am in no way 5 times better at the game... The skill system at the moment literally thinks it would take her combined with another skill level 3100 player to balance out against 2 players of my skill level..... I play sat right next to her, I watch her play the game while i'm waiting for a slot on the server. Something is definitely not right.

I was looking at shelmo on the dashboard and she is mostly surrounded by people who only played 1-5. Having not played with her, going by your account, and sheer hours she has put in it does seem lower than one would expect.

What is the average players skill on Hellarious Basterds? Her score could be a result of solely playing one one server if people with her skill also have a similarly low skills core. I often feel my skill score is overinflated when I go to servers I don't usually play on. But probably the same as her, I only have 1 or 2 servers I actually play on.

On Hellarious Basterds do force even votes happen often? Since she has a low score, assuming she is far more skilled than her score suggests, FET should skew team balance in her favor causing her score to go up. Judging by how she loses more games than she wins, this appears not to be the case.

Therius · March 2015

@nemo

Everything I have to say about that is in the posts above. Unless you can point out a specific error or inconsistency within the model, it is of little use to bombard the discussion with individual examples and links to individual matches. The key words in your post are still 'feel' and 'believe'.

I'm always open for the idea that there's something wrong with the model, as everyone should be. But when the only argument is "because I think so", and basing that intuition on individual examples instead of the bigger picture, there's absolutely no reason to think that.

There are problems with the model, as have been extensively discussed above, but none of them are related to the statistic with which the skill rating is measured. UncleCrunch's suggestions of having k/d, score etc. directly affect the skill rating is a fundamentally, objectively flawed idea, because these measures are ALREADY indirectly incorporated within the w/l statistic, as are every single measure that has an impact on a player's tendency to win. On one hand, if you ignore the w/l statistic and base the skill rating on these measures instead, you will ignore a myriad of other features that determine a player's probability of winning. On the other hand, if you include these measures along with the w/l statistic, you will actually include them TWICE, since they have an indirect effect on the player's win-lose ratio and a direct effect through the measures you just introduced. Furthermore, if these measures are directly included, they can be abused. Players can and will find ways to exploit these features, farming the statistics in a way that might not help the team win or might even be detrimental to their goal. All this would lead to the skill rating being biased and no longer representing the player's tendency to win a match, instead approximating the player's capability of accruing these specific numbers.

Besides, who's to say which measures to include and how much weight to place on them? You would need an expert or experts to decide on these, and no expert in the world could determine all the determinants of a player's tendency to win. In addition, different experts would disagree on which measures are the most important; one would say that aim percentage is the most important measure while another places more weight on teamwork. Who's to say who's right? Who watches the watchmen?

Data, however, have no opinions, no bias, no agenda. If no one (accidentally of deliberately) meddles with the raw data and the model is correctly specified, then the win-lose ratio is the only unbiased estimator of a player's tendency to win. Unless you prove that someone is deliberately manipulating the data or point out specific errors in the specification of the model, individual examples have no weight, no matter how strange they may seem.

Luchs · March 2015

nemo wrote: »

Here is her profile again, http://hive.naturalselection2.com/profile/83013717 down to 304 now. This cannot be correct, I am in no way 5 times better at the game... The skill system at the moment literally thinks it would take her combined with another skill level 3100 player to balance out against 2 players of my skill level..... I play sat right next to her, I watch her play the game while i'm waiting for a slot on the server. Something is definitely not right.

Question to the people here who understood the math behind the calculations (I'm afraid I only have a vague understanding):

The last game on ShellMo's hive profile (March 9, Hellarious Basterds, ns2_veil) cost her 54 points in a 14 minute game with a starting score of 358 (ending on 304). For a 14 minute game to cost her that much, wouldn't that mean the average skill level on the opposite side would have to be significantly lower than 358?

Nordic · March 2015

Luchs wrote: »

nemo wrote: »

Here is her profile again, http://hive.naturalselection2.com/profile/83013717 down to 304 now. This cannot be correct, I am in no way 5 times better at the game... The skill system at the moment literally thinks it would take her combined with another skill level 3100 player to balance out against 2 players of my skill level..... I play sat right next to her, I watch her play the game while i'm waiting for a slot on the server. Something is definitely not right.

Question to the people here who understood the math behind the calculations (I'm afraid I only have a vague understanding):

The last game on ShellMo's hive profile (March 9, Hellarious Basterds, ns2_veil) cost her 54 points in a 14 minute game with a starting score of 358 (ending on 304). For a 14 minute game to cost her that much, wouldn't that mean the average skill level on the opposite side would have to be significantly lower than 358?

I only have a conceptual understanding myself, but it does not mean the other team had to have a lower score. Afaik, it just means that it was an unexpected loss. Her team could of had an average score of 1200 and the opposing team an average of 800. Someone better with the math would have to confirm though.

nemo · March 2015

@Therius I totally agree that the only metric worth considering is win/loss. You keep repeating this over and over, I am not suggesting anything different.

Just because its maths doesn't make it perfect. For example nobody has proven that skill levels are a linear scale. Yet that assumption is taken when we average the skill levels of all players on a team, in order to balance them and is ALSO used as the core part of the feedback mechanism for modifying skills at round end. So yes, it is based on win/loss as it should be, but its still wrong.

In addition, depending on the time of day my own personal skill level varies between 1600, and 1800. Which is it? Am I getting better and worse throughout the course of a day? ShellMo's skill has at one point gone from 400 to 450 to 300 in the space of an evening. Its clearly an approximation, it aggressively oscillates. That wildly varying number is then incorrectly averaged to get the teams skill, which then causes the error to compound when its used to make a prediction of the game outcome.

Our own personal experiences should not be discounted so easily. I for one have been part of this community for over 12 years now. When captains mod is used, it does often result in well balanced games, which demonstrates that it at the very least correlates with reality. So when there are two models, and they completely disagree, then its worth additional consideration instead of being discarded entirely.

Therius · March 2015

@nemo

I'm repeating my point because UncleCrunch keeps arguing against it. I agree with you on almost every point. The problems you present are exactly the same ones I have suggested, especially the problem that the average is not necessarily representative of team skill.

I'm not discarding personal experience completely, but instead of basing actions on opinions stemming from experience, these opinions should only be used to justify further study. The results of this study should then be used to justify actions. For instance, UncleCrunch isn't saying that in his opinion the win-lose ratio might be a bad metric and that we should either experimentally or analytically check whether we can find a better, he is saying that in his opinion the win-lose ratio IS a bad metric and it SHOULD be changed. Opinions and educated guesses should only be used as a map to find the facts that are out there. These facts are the objective entities then to be used in practice.

Obviously, we cannot do extensive study and research in every little thing our lives, but the subject at hand is a perfect example of something that can be analytically and empirically solved.

krOoze · March 2015

Therius wrote: »

UncleCrunch wrote: »

What i see right now as a fact:

This condenses the problem with all of your arguments. None of them are fact. All of them are either opinions of observations from your personal, very small, probably psychologically biased sample.

Well that's how every person percieves. The game should idealy make the illusion that things work well (it may be even more important than the things actually working well in reality). If things are bad for one person every time and good every time for 9 others, the game may be statistically speaking okay, but is unfair to that one person.

Therius wrote: »

@nemo
Players can and will find ways to exploit these features, farming the statistics in a way that might not help the team win or might even be detrimental to their goal.

Why would anyone do that. They would only be assigned to harder team by FET and effectively lose their Skill Points in the long run. If anything people would try to downplay their statistic to be able to stack. IMHO what kind of person has need to do something like that? The gameplay is no longer sufficient to him, that he must collect meaningless number? It is important to him that he is 1237th on hive instead of 1250th???

Therius · March 2015

@krOoze

If you're saying that people would not sit in the alien hive farming skulks with jetpackshotgun combos if the skill system gave you more points for doing that, then you're living in a dream world. Okay, maybe a bit too harsh since I'm the one advocating for data and not opinions here, but given we do not have data for that kind of thing, my hypothesis is that it would happen. Yes, this would lead to those players losing more often, as the skill system would continously overestimate their competence, but would they care? My hypothesis is that they would not, instead feeling rewarded by the unjustifiably large number that they have recorded in the database, even though they lose more games than they should.

nemo · March 2015

I've had a bit more of a think about this, I have an idea what might be going wrong with ShellMo's profile but I need someone to check my maths.

I think its simply because she plays marines more than aliens. 370hours Marines, vs 166 hours Aliens.

According to http://ns2stats.com/ marines are winning 46% of games to aliens 54%. I'm going to assume that game length averages out over this many hours so we can use hours in place of actual wins/losses (so this wont be completely accurate).

Assuming the teams are always fair i'd expect her to win 170hours of those 370 hours of marines, and win 90 hours of those 166 hours of aliens.
This results in 260 hours of winning to 276 hours of losing. A win/loss ratio of 0.94. Her actual W/L ratio is 0.98 which is above this!

However the skill system assumes that with even teams marines and aliens should have an equal chance of winning. So since she prefers Marines her skill score plummets into the ground.

To demonstrate the point further I will invert it and pretend Marines win 54% of the time instead. In which case she would end up with a Win/Loss ratio of 1.06.

I'm not convinced I am right though, because if you can artificially inflate your skill score by playing aliens, and artificially lower it by playing marines, then I would expect to see the top hive stats players all playing a lot more aliens than marines, but this is not the case. hmmm, perhaps the rule inverts at very high skill level? I just don't know.

Nordic · March 2015

At upper levels of skill marines are stronger than aliens.

krOoze · March 2015

@Therius I would JP shotgun poor skulks, but for completely different reason

Its their own fault, they should have conceded. Using "concede" seems to be an hard skill to learn these days....
You get less/no points if you lose I think. Let them have their slightly higher KDRs, if their score that is used for FET stays objective.
There is worse way to cheat and that is to make a mod that simply sends bad data to hive.

As for ShellMo, I never seen him/her play, but maybe it's in the playstyle. If I play marine defensively, I am usually at the bottom of the rating table, even if I get good number of kills. Somethimes I get at top even with something like 3/10 K/D. I guess you must actively be destroying structures and advanced liveforms too and build/weld. I see that she/he is loosing more Skill points on loses, that she/he can get back in wins. Maybe concede if you lost and be more aggresive while winning. EDIT: Or better don't bother about the stats. On her/his last log there is 7 wins and 13 loses (and the wins are suspiciously short games), so maybe just a bad streak of choosing bad(nega-stacked) side, then choosing really stacked side in anger that wins too quickly. The system tells correctly that he/she is currently bad luck charm.

BTW there is one bug in Hive that bothers me. No matter what my search text is, the hive always returns sideways among them

UncleCrunch · March 2015

Therius wrote: »

UncleCrunch's suggestions of having k/d, score etc.

Hmm no.

While i think K/D is a component of the player skill (basically aim); it is not my proposal and i wouldn't do such a thing. I always voted for teamwork actions to have a greater weight than K/D. I'm afraid it's someone else.

NotPaLaGi · March 2015

nemo wrote: »

I think its simply because she plays marines more than aliens. 370hours Marines, vs 166 hours Aliens.

He shoots, he scores! Although I surprised this is a revelation to you.

The win conditions for marines are pretty steep. You need competent players watching lanes, at least a couple decent shots to help frag lifeforms, and a pretty good commander. If you missing just 1 out of 3 of those, you will probably lose. You can be held back by your teammates much more on marines.

On aliens, I can have a team full of rookie distraction skulks/gorges and a potato in the hive that manages to drop a harvester here or there and some upgrades, and I am still able to frag as skulk effectively and lerk/fade as needed and probably still win the game. As long as some of my alien teammates can seldomly find extractors to bite, they won't hold me back from playing effectively.

However, I play almost 75% of my games on marines: http://hive.naturalselection2.com/profile/26266134
If I evened out my games and played more aliens, I would no doubt win more and increase my "skill" close to the 3k range.

Here is a good example of a decently skill player that plays a shit ton of aliens and wins at a 2:1 ratio because of it (http://hive.naturalselection2.com/profile/29762509). He is neither an exceptional marine or alien, yet he is "ranked #6" on the skill leaderboard.

If Shellmo is really concerned about her skill rating, she should just play more alien so she can win more because W/L is weighted so heavily. But personally, I would tell her not to worry about it. If she has fun playing marines a lot, keep at it.

Therius · March 2015

UncleCrunch wrote: »

Therius wrote: »

UncleCrunch's suggestions of having k/d, score etc.

Hmm no.

While i think K/D is a component of the player skill (basically aim); it is not my proposal and i wouldn't do such a thing. I always voted for teamwork actions to have a greater weight than K/D. I'm afraid it's someone else.

You are right, I am sorry. But my point still stands, there is no fundamental difference between individual metrics like k/d and teamwork metrics like assists when it comes to the skill system.

NovoRei · March 2015

@Therius

Current rank system is not a good representative of the likely-hood of winning, much less player skill.
It is a strongly biased system because it depends on human decision (team selection) therefore you don't have control over the population (FET is not default).

Yes, current rank system encompasses (through w/l point awarding) most game variables but you have two problems:
1) They are a consequence of a biased system.
2) If most of them are not significant then you have the risk of the final parameter being insignificant.

I prefer to assess the likely-hood of winning by looking at the potential player contribution (in game score) and at the contribution boost factor of certain players together (past experience). Does this work for pub games? Yes, look at my hive.

The first (from previous paragraph) is something we have available. The more score a player is bringing compared to the rest of team, more likely he's is contributing for victory. And this happens on the winning side and on the loosing side. Current rank system doesn't consider individual effort (represented by in game score) put by a player. It only looks at team level and gives points to a player. Worse, it says the higher score players were as useful as the lower score players by awarding/taking away the same amount of points.

SD FET correction would still be needed though.

Zefram · March 2015

NovoRei wrote: »

I prefer to assess the likely-hood of winning by looking at the potential player contribution (in game score) and at the contribution boost factor of certain players together (past experience). Does this work for pub games? Yes, look at my hive.

I might agree with your post but looking at your hive profile will tell you no such thing. You're one of the players who heavily game the system by teamswitching when your team is losing so you dont lose hive "skill" points.

Therius · March 2015

@NovoRei

Your point is moot. The skill rating does not get skewed and is not biased because players can select their teammates (by choosing which team to join). The system takes into account the rating of your team and the opponent team and assigns a probability of which team (yours or your opponent's) wins the game. This probability is then taken into account on how much your score changes at the end of the round. So by choosing your team, you implicitly choose your probability of winning that match, but no matter whether you always choose to play with 70-30 odds or 40-60 odds, your skill rating will converge on the same number. Your w/l statistic will be different, but you will gain/lose a different number of points from each victory/defeat according to your odds.

Let's go through a simple scenario. I'm not going to present any specific numbers since I'm not 100% familiar with the exact parameter values of the model, but take it only as an example to get the idea across.

Let's assume you have two choices. Either you stack and join the stronger team, or you anti-stack and join the weaker team. Let's say that the skill system already in place tells you that stacking and joining the stronger team gives you a 70% chance of winning and a 30% chance of losing. Likewise, joining the weaker team will give you a 30% of winning and a 70% chance of losing. For simplicity's sake, let's assume that this exact scenario repeats over 100 rounds, and the player always has a choice on which team to choose.

Now, let's assume we have two players of EQUAL skill both in actual competence and the system skill rating, player A and player B. Player A is a stacker who likes winning, and so he always joins the stronger team. Over the 100 games he plays, he will have a win-lose ratio of 70/30 = 2.33. Player B, on the other hand, is an anti-stacker who enjoys playing the underdog. Over his similar set of 100 games, his win-lose ratio will be 30/70 = 0.43. The win-lose ratios of these two players are completely different, and it seems that player A can farm his skill up by just stacking every time, while player B gets left behind, even though they are of similar skill.

However, the skill system takes the probabilities of winning into account. Let's say the skill algorithm awards player A 6 points for every victory. This means that the same algorithm takes 14 points away from him from every loss. Intuitively, losing with those odds is a bigger screw-up than winning is an achievement. Likewise, player A is playing against the odds, and thus he GAINS 14 points for every victory, but only loses 6 points for every defeat. Both players will retain their previous skill ratings and be of equal skill in the eyes of the skill system, even though their w/l ratios are completely different. You could think of player A and B as the same player who simply changes his team selection priorities at some point in time, getting tired of stomping rookie skulks and wanting to fight uphill battles. This example shows that it would not affect his skill rating at all. You cannot affect your skill rating with team selection.

This is an exceedingly simplified scenario, and it does not take into account variables that might affect the probability of an outcome that are not inside the model. These problems have been discussed in this thread, and they include but are not limited to:

Are marines and aliens balanced, i.e. can you skew your skill by preferring one faction over the other?
Does the standard deviation of skill ratings affect the probability of an outcome controlling for the average skill of a team, i.e. does one pro and a bunch of rookies equal a team of mediocre players even if the average skill ratings of the team are the same?
How much does the player count affect gameplay, so if you switch from playing exclusively on 42-player servers to playing exclusively 20-player servers, is your initial skill rating at all comparable to people who have already played on these servers for a long period of time?

None of these problems are due to the metric used to define the skill rating, however. None of these problems will be solved by switching from w/l to other metrics like score, k/d, building time or shoe size. The same problems will arise, but in addition, even in a controlled system, the skill rating would no longer unbiasedly represent the player's chances to win, but instead would represent the player's capability of accumulating these metrics even in situations where they might be completely insignificant in helping the player's team to win.

meatmachine · March 2015

^^^^^^ That is the kind of post the thread needed. Valid points on those bullets (and I am of the persuasion that these ARE issues with the current system). I would like to see an implementation that solves these. In an ideal world there would be some kind of seperate rating for alien/marine, but I suppose a band-aid for the marine/alien WR skill reward disparity would be to weight the points for winning/losing as a marine or alien dependant on the overall win% for either team?

krOoze · March 2015

@Therius By the same reasoning(skill points = probability of win), it is doing it correctly for alien-marine disbalance. When you play the easier faction exclusively you get higher win chance(and thus more Skill Points). If you play the harder faction you get less probability of win=less SP. If you play half half you get something in between. But yes,I think it probably needs separate skill points for each side, for the purposes of the FET feature. Again average of your marine and alien skill used now is not good enough.
To add to your list:
-Mods that create inbalance
-What's the team probability of win used at the end of game, when people come and go and change teams. BTW what SP does a individual player get when they join game late or leave early? What SP does players get when they play 10 vs 5 people(which 5 of the players from the 10 are used to calculate the probability)

I think enforcing FET and thus assuming games are always +- balaced would greatly reduce the complexity and nonsense that is (needed) in the algorithm. I for myself go Random most of the time anyway... Silent majority would be happy, that they dont have to wait half an hour in pregame. There would be laud minority of course... Let the game support both no? But make FET default and do not count occasional unbalanced games into the system.

Skill System is Good

Comments