I don't say it doesn't work at all. I say there is no way to tell if a player is good or not at NS2 with this skill number. That is why i say that it would be ok for CS or COD (a drone (or rabbit) game) but not NS2.
The outcome of a NS2 game is unpredictable by definition. Why ? There are events in the game provided by strategic options like tunnel, power nodes etc, that can turn upside down a game. I still vote for more options like that, because commanders start to scan much more when games are lasting (predictability blah blah)... but that's another topic.
The more complicated a game is, the harder it is to quantify with stand-alone metrics like K/D, score etc. You could easily quantify a player's skill in noughts and crosses by measuring whether the player makes the only optimal move or not, since there is nothing else involved in the game. Measuring a player's tendency to win is the best metric to capture every good move a player makes in a more complicated game. You say that NS2 is so complicated that we should use individual metrics instead. You're arguing against your own point.
What are the limits (min max) of P & Q ? I assume that 'real probability' is actually the outcome of the game.
Probabilities only ever get values between 0 (an impossible case) and 1 (a certain case). The 'real probability' is not the outcome of the game, but the outcome's intrinsic probability, which the system is trying to estimate.
The more complicated a game is, the harder it is to quantify with stand-alone metrics like K/D, score etc. You could easily quantify a player's skill in noughts and crosses by measuring whether the player makes the only optimal move or not, since there is nothing else involved in the game. Measuring a player's tendency to win is the best metric to capture every good move a player makes in a more complicated game. You say that NS2 is so complicated that we should use individual metrics instead. You're arguing against your own point.
Nope, so far the bonus (or else) is received by the entire team that won or lost (minored by the time passed in the game). So it's the TEAM tendency that is rewarded (or not). It is quite different on MANY aspects and quite different on how an individual player score evolves against how it actually should evolve. In fact it isn't a measure on a single player at all.
Nope again : "capture" (sampling) cannot be interpreted based on a metaphoric number (win/loose * ratio) that is actually quite distant (the meaning) from the actual sample. How could you tell from this point if a player was a good shot (accuracy) ? And how much time he was building structures (teamwork?) ? Tendencies aren't samples. Hive is considering Win/loose. It's not the same.
I don't say it's useless, i say it's the wrong assumption. It's like saying : We can trace back the initial butterfly wing beats that are eventually responsible for the typhoon 1000km away. And we can say how many butterflies did it at what precise time and altitude. If you can do that; maybe quantum physics is one of the field you should try, the world needs it. Thumb up.
It looks like you say the win/loose is better at measuring all the different player traits because we don't have the data or we don't need it. Actually, there is plenty and maybe more (a "maybe" i can like). This data would allow a faster convergence as a player can't fake this precise things. At some point the players play...
Take a second key and play NS2. You'll see the accuracy you get won't really change but your hive score is at 0 on the first game. Other behaviors that are precisely key elements in playing NS2 properly, won't change either. Still... You will have to climb that "Skill cliff" bit by bit before FET can actually be able to process the closest picture possible of your skill.
I prefer to play NS2 well and with people who play well. Meaning i don't care about how many times they (or i) won a game. Win or loose is actually completely secondary. Maybe you never got in this kind of game in which no one cares about the Win/lose that happened. Thus it cannot be a good interpretation of their potential. Or should i say their many potentialS.
I'm getting extremely bored of stating the same things over and over again, but I'm going to do it one final time, with as few fancy words as possible.
If we want to measure a player's contribution to the team's probability of winning (a goal I don't think anyone argues against), we have two options. The first one is to think about all the INFINITE amount of things that affect that contribution (the player's aim, reaction time, communication skills, teamwork abilities, map awareness, hardware, internet connection stability, amount of pets, bladder control...), decide which ones to include and how to weigh each one of them via a scoring system through an endless discussion where no one agrees with anyone else and the end product being AT BEST a biased estimate that has an INFINITE amount of things that haven't been taken into account and that needs to be rebalanced EVERY TIME something, however small, is changed within the game rules.
The second one is to go around this problem by basing the skill rating on the only thing that actually captures every single one of these infinite determinants of victory: how much the player has won in the past and against what kind of odds. This captures every determinant of victory BY DEFINITION. It's not a matter of opinion. If something a player does helps the team towards victory even in the slightest bit, it's reflected in that player's past victories ON AVERAGE. If a player is good at calling out base rushes and saves many games with those calls, he will win more games ON AVERAGE. If the game rules change in a way that make base rushes less effective, that player's contribution will be smaller in the future, since his skill of calling out the base rushes is not as useful anymore, he loses his advantage and loses more games in the future ON AVERAGE, leading to his score going down. The key word here is ON AVERAGE. You can play a perfect game and still lose, but that does not happen ON AVERAGE. Likewise, you can play a horrible game and still win, but that does not happen ON AVERAGE. ON AVERAGE these things balance each other in a way that the player's score represents his chance to win ON AVERAGE. If the player changes his way of playing, for example, starts commanding more or starts learning new lifeforms, his skill rating will still be based on his past victories that were got from his past performance where he used his past strategies, and thus it will be wrong. But this means that the player will start losing more than the system predicts ON AVERAGE, bringing his skill down to a level where he no longer loses or wins more than predicted ON AVERAGE.
Nope again : "capture" (sampling) cannot be interpreted based on a metaphoric number (win/loose * ratio) that is actually quite distant (the meaning) from the actual sample. How could you tell from this point if a player was a good shot (accuracy) ? And how much time he was building structures (teamwork?) ? Tendencies aren't samples. Hive is considering Win/loose. It's not the same.
I do not understand why I bother. You do not even know what the word sample means. And you call the win-to-lose ratio metaphoric. I've lost hope.
I didn't really look into the skill system in detail, but I have one specific question.
Imagine the following situation. There is a 20 slot server filled with the following player skills:
1 x 3000
3 x 2000
17 x 1000
scenario 1
1 x 3000 + 9 x 1000 -> mean skill 1200
vs.
3 x 2000 + 7 x 1000 -> mean skill 1300
leading to a skill difference of 100.
scenario 2
1 x 3000 + 1 x 2000 + 8 x 1000 -> mean skill 1300
vs.
2 x 2000 + 8 x 1000 -> mean skill 1200
leading to a skill difference of 100.
From my experience scenario 2 would be the more fair one. I played a lot of games where I (skill ~3000) was matched against some pretty decent players (skills >2000) leading to similar overall skills. Although one very good player can (especially as marine) achieve a lot in NS2, he can't be everywhere at once.
Now my question is: Does the skill system take into account any sort of standard deviation (to take into account outliners in the skill distribution) or just the pure mean skill values?
This has been brought up before, and no, the skill system does not take into account the standard deviation of skills within a team. Only the average. It is problematic. It's hard to come up with any analytical solution to it, especially if we don't have any data on how big an effect the std has on the error in prediction. A soft solution could be to adjust the FET to favour compositions with minimal std, but teams built without FET would still be left for stranded.
I'm getting extremely bored of stating the same things over and over again, but I'm going to do it one final time, with as few fancy words as possible.
If we want to measure a player's contribution to the team's probability of winning (a goal I don't think anyone argues against), we have two options. The first one is to think about all the INFINITE amount of things that affect that contribution (the player's aim, reaction time, communication skills, teamwork abilities, map awareness, hardware, internet connection stability, amount of pets, bladder control...), decide which ones to include and how to weigh each one of them via a scoring system through an endless discussion where no one agrees with anyone else and the end product being AT BEST a biased estimate that has an INFINITE amount of things that haven't been taken into account and that needs to be rebalanced EVERY TIME something, however small, is changed within the game rules.
The second one is to go around this problem by basing the skill rating on the only thing that actually captures every single one of these infinite determinants of victory: how much the player has won in the past and against what kind of odds. This captures every determinant of victory BY DEFINITION.
There is no need for a "infinite amount" of things. As i said winnings aren't useless. But it cannot be dumbed down to only one value. Especially one of the most unpredictable as soon as you get to higher levels.
Please seriously; you should really play NS. You assume that the NS games played are a perfect world in which every good action is rewarded on average (in the end). I'm afraid i have to say "Far from it". You should really tell that to the D1 comp. I would like to see their faces hearing that. Like "dude... what he said?". Just that; will make my day.
This system rewards the team who won, not the individuals for any specific action. This is exactly where the flaw is. The system rewards the entire team (good or bad player in it) and then FET tries to sort and assign them with their personal scores. Personal / team... Personal / team... Personal / team... Personal / team... see ?
It's not a matter of opinion. If something a player does helps the team towards victory even in the slightest bit, it's reflected in that player's past victories ON AVERAGE. If a player is good at calling out base rushes and saves many games with those calls, he will win more games ON AVERAGE. If the game rules change in a way that make base rushes less effective, that player's contribution will be smaller in the future, since his skill of calling out the base rushes is not as useful anymore, he loses his advantage and loses more games in the future ON AVERAGE, leading to his score going down. The key word here is ON AVERAGE. You can play a perfect game and still lose, but that does not happen ON AVERAGE. Likewise, you can play a horrible game and still win, but that does not happen ON AVERAGE. ON AVERAGE these things balance each other in a way that the player's score represents his chance to win ON AVERAGE. If the player changes his way of playing, for example, starts commanding more or starts learning new lifeforms, his skill rating will still be based on his past victories that were got from his past performance where he used his past strategies, and thus it will be wrong. But this means that the player will start losing more than the system predicts ON AVERAGE, bringing his skill down to a level where he no longer loses or wins more than predicted ON AVERAGE.
You still avoid the big questions.
How many games do the rookies with 0 and those with 1000 have to play in order to make their Skill score go towards something close to reality ? Meaning 1/ their score will be stable (convergence); 2/ their score will be accurate so FET can use it. Considering a game will last from 10 min to 40 min (maybe more), let's say 20 minutes on average. How much time until they get there ? ... before they quit NS2 ? 100hrs? 500 ? hmmm That is something to look into isn't it.
Also you still don't answer to: What if a rookie is doing nothing (even making bad things that probably annoy the rest of the team) but still IS in the winning team. Whatever the circumstances or prediction. Do you really think this guy is rewarded to a proper value ? rewarded in the first place ? Do you really think by sticking with great players his Skill Score will go towards something accurate and useful for FET ? I mean gamers are always exploiting everything they can, right ? Just because they're rookies doesn't mean they're dumb. Even if this behavior is dumb it WILL happen as it is already.
Yet nobody including you did answer to the Gorge/Fade problem. Say FET was voted and the data is accurate in "converged wonderland" : 2x2000 + 3x1000 on both teams except one player at 500. In the case the prediction says that team A should win. In order to be in harmony with it; the Team A 2Ks should play the most aggressive role. If the 2K evolve Gorge right at the beginning they leave an empty space the opponent will surely take. leading to a lost game for sure. What happens if they decide to play gorge for several months ? eventually ruining the fun for all ?
Will they be less "skilled" ? id don't think so. What if after that, they suddenly decide to return to the fade "ripper" mode ? Does the prediction will fail all the time ? On thing is for sure FET won't have the right data about them as it can be manipulated. Because they didn't changed at all. They just made the choice to do something different. How do you solve that ?
The answer isn't : they should play like drones as anyone can do in other dumbed down games. You know "My NS is rich!".
Every question you say I haven't answered I actually have. You just cannot understand the answers. You read my posts but do not understand what I'm saying. I have said what happens if someone sits AFK and does nothing, but you do not understand. I have said what happens if someone does not play the role they are best at, but you do not understand. Many people have told you how the skill system works and answered your questions directly, but you keep asking the same questions and making the same false claims.
You do not understand what statistics, probability or 'on average' means. You think we live in a world where probabilities mean nothing, and every statistical result can be rebutted by saying "but I saw this happen once so you're wrong". You probably think that scientific studies and gathering data for statistics is a useless waste of money and every statistic is a lie. This conversation is fruitless.
I completely understand the frustration of, playing the game of your life, be at the top of the leader board with 50-0 K/D, and still get punished for not being on the right team.
However, NS2 is simply far too complex to quantify skill level through individual accomplishments. The field comming example definitely applies here - There is no way to quantify someone being really good at organizing and/or leading a team - Likewise you can't punish someone for being completely uncooperative (recommendations simply do not cut it).
This approach evaluates you based on your ability to win - regardless if your ability is based on being a great fragger or being a great leader or something entirely different.
If there is an abuse in the system, the solution is not to throw out the baby with the bathwater, but to fix that gap that enables the abuse.
@Therius Yet he's right on one thing. "Team" does not equal "player". Let me illustrate with SIMPLIFIED example:
1) Assume all players in the system have correctly calculated SP value. And assume all games are made with FET with exactly calculated 50/50 probability.
2) New player comes into the system and is assigned 0 SP value (he is better than that in reality).
3) Round is made with calculated 50/50 probability. The real probability is let's say 49/51 (newbie in team2 gets better team by FET)
4) Lets say nobodys real skill changes. Over time Newbies SP will be pushed toward the real value, everybody elses SP will be skewed from their previously perfect value (newbies team get more than they deserve, opposing team is unfairly penalized). That is until newbies SP value converges and the real probability gets to 50/50 (well, never in perfect case, because it closes the gap somewhat asymptotically) go to step 3). It does not change the probabilities, because players are picked at random to be 50/50 calculated probability. The signed error sum of the randomly picked players will be 0, so it won't change the "real" probability.
You see, the error of one player's SP value creep to all players. In the end the error propagates as normal distribution (small number of players get overinflated SP value, some get unfairly low SP value, most get slightly more or less than they deserve), making the system a "lottery". You can get any value, given enough luck and no cheating and stat hoarding(which ia absolutely possible) is even necessary.
Given that every SP value is wrong, second problem creeps in. The system won't correct the errors even when the newbie is gone - The FET will assign them with 50/50 probability. The real probability will be 50/50 (no SP will change on ave ... I mean ON AVERAGE ), beacuse the signed average of the errors in system is 0.
Third problem creeps in and that is the player that does more than average in the team will be rewarded less than he deserved. The person that does less than average will get more. Your SPs are now dependent on where you stand in the demographic on the given server.
tl;dr
Those with the extreme trust in the math of the system should get their confidence shattered a bit.
Those hating the system should note that even in the bleak example above, even when everyone has inaccurate SPs, the FET balanced game is still even on average (half the time??), if you ignore the number as your manhood measure.
@unclecrunch, the first part of your last point seems to be about semantics. Am I correct in that you dislike that the number is called "skill point." It is correct that the number does not equate to their skill. Really it is just a number showing that a 2000 player is more likely to win on average than a than a 1500 player.
The rest of your post talks about convergence which is a known issue. I have spoken about it before, so have a few others. It is fairly obvious that convergence is too slow if you look at the "skill" distribution.
Would I be correct in saying you think the "skill system" does not work most of the time because of convergence being too slow?
@unclecrunch, the first part of your last point seems to be about semantics. Am I correct in that you dislike that the number is called "skill point." It is correct that the number does not equate to their skill. Really it is just a number showing that a 2000 player is more likely to win on average than a than a 1500 player.
I think a lot of people dislike the system because they feel it doesn't represent their skill in tracking, aiming or whatever. Since it's not designed to do that, how about renaming it to "balance points"?
@unclecrunch, the first part of your last point seems to be about semantics. Am I correct in that you dislike that the number is called "skill point." It is correct that the number does not equate to their skill. Really it is just a number showing that a 2000 player is more likely to win on average than a than a 1500 player.
I think a lot of people dislike the system because they feel it doesn't represent their skill in tracking, aiming or whatever. Since it's not designed to do that, how about renaming it to "balance points"?
If we can't hide the number, that may be a positive alternative.
@Therius Yet he's right on one thing. "Team" does not equal "player". Let me illustrate with SIMPLIFIED example:
1) Assume all players in the system have correctly calculated SP value. And assume all games are made with FET with exactly calculated 50/50 probability.
2) New player comes into the system and is assigned 0 SP value (he is better than that in reality).
3) Round is made with calculated 50/50 probability. The real probability is let's say 49/51 (newbie in team2 gets better team by FET)
4) Lets say nobodys real skill changes. Over time Newbies SP will be pushed toward the real value, everybody elses SP will be skewed from their previously perfect value (newbies team get more than they deserve, opposing team is unfairly penalized). That is until newbies SP value converges and the real probability gets to 50/50 (well, never in perfect case, because it closes the gap somewhat asymptotically) go to step 3). It does not change the probabilities, because players are picked at random to be 50/50 calculated probability. The signed error sum of the randomly picked players will be 0, so it won't change the "real" probability.
You see, the error of one player's SP value creep to all players. In the end the error propagates as normal distribution (small number of players get overinflated SP value, some get unfairly low SP value, most get slightly more or less than they deserve), making the system a "lottery". You can get any value, given enough luck and no cheating and stat hoarding(which ia absolutely possible) is even necessary.
Given that every SP value is wrong, second problem creeps in. The system won't correct the errors even when the newbie is gone - The FET will assign them with 50/50 probability. The real probability will be 50/50 (no SP will change on ave ... I mean ON AVERAGE ), beacuse the signed average of the errors in system is 0.
Third problem creeps in and that is the player that does more than average in the team will be rewarded less than he deserved. The person that does less than average will get more. Your SPs are now dependent on where you stand in the demographic on the given server.
tl;dr
Those with the extreme trust in the math of the system should get their confidence shattered a bit.
Those hating the system should note that even in the bleak example above, even when everyone has inaccurate SPs, the FET balanced game is still even on average (half the time??), if you ignore the number as your manhood measure.
When you say "correct" sp value it you are assuming there is a number that directly correlates with that player. There is no correct number. "Skill Points" is a fluid value that changes relative to who you play with. I may have about 1800 "Skill points" now but if I started playing with only premier skilled players I would be far less likely to win on average. My sp value would decrease substantially over time and become relative to whom I am playing with. If I were to start playing with only rookies my sp value would adjust to show my relative skill compared to those specific rookies.
This is not a lottery, but instead a problem of slow convergence.
^ "correct" I mean that "Skill Points"="real skill" exactly for every player, for the purposes of the example. What word would you use (I am inclined to edit that for clarity)?
If you started playing with skilled players, you would have some on your team too, not really changing your personal likeliness of win.
The "lottery" is the part, where you are assigned SP value based on pseudo-randomness(players you play at the particular moment and their respective errors in their SP values). Which in turn starts to have statistical properties of its own, that mess with the system. Of course it IS simplified example and I can't know(does anyone) to what degree and in what shape it manifests in the real system (it is beyond mere though experiment and I would have to write a complex simulation).
Slow convergence is different problem. But if you think the SP are not supposed to represent players skill, what then it is supposed to converge to in your opinion?
If I was I was in an 8v8 match with 15 premier level players, the team with me on it would on average lose more often. Someone may want to point out that the entire team would lose points, not just me. With enough games of random mixing all 15 premier level players would be more likely to win, and have an increase in their sp value. I would on average lose more often and have a decreasing sp value.
I don't think skill points directly represent real skill. It does certainly correlate with skill relative to the players played with, which is useful for team balance purposes.
There are problems with not enough mixing in the ns2 community. One premier level player I know of has an sp value of about 1200 which in no way is useful for balance if he were in a public game. The more public games he plays the more his sp value converges to something useful for balance.
Someone may want to point out that the entire team would lose points, not just me. With enough games of random mixing all 15 premier level players would be more likely to win, and have an increase in their sp value. I would on average lose more often and have a decreasing sp value.
Ok, lets say your team by having you has lower chance of winning. That is almost the same as the example in my previous post. In every round your team would lose more SPs and opposite would gain more on average (despite no change to their "real skill").
I have thought about exactly that today. After many random mixing rounds it will lead to normal distribution of the error among all the players ( few will get way less SP, most will get milder error, few will get way more SP - but average error of them all will be 0, so the system won't detect that and won't correct that.)
Of course that still leads on average to "balanced games". But is average enough? (That is maybe half games adequate and half terribly unbalanced for one or the other team. Then add people bickering about and circumventing FET and you get maybe 1/4 of good games)
I find that if you read UncleCrunch's posts in a Russian accent, they are still pretty bad but at least it makes his sentence structure and grammar a bit more bearable.
Lol. I typed that out on mobile. I guess the mobile browser decided to post 3 times at once. What is really funny is I did not understand what you were talking about at first. I thought you were trying to make a point about my most recent post, which was partially right.
Someone may want to point out that the entire team would lose points, not just me. With enough games of random mixing all 15 premier level players would be more likely to win, and have an increase in their sp value. I would on average lose more often and have a decreasing sp value.
Ok, lets say your team by having you has lower chance of winning. That is almost the same as the example in my previous post. In every round your team would lose more SPs and opposite would gain more on average (despite no change to their "real skill").
I have thought about exactly that today. After many random mixing rounds it will lead to normal distribution of the error among all the players ( few will get way less SP, most will get milder error, few will get way more SP - but average error of them all will be 0, so the system won't detect that and won't correct that.)
Of course that still leads on average to "balanced games". But is average enough? (That is maybe half games adequate and half terribly unbalanced for one or the other team. Then add people bickering about and circumventing FET and you get maybe 1/4 of good games)
In my overly simplified example, there are 16 players. 15 of which are premier level players which is the highest division in ns2 competitive. It does not go higher. Div 1 teams typically have the same mechanical skill in aim and movement as premier players to the point that what makes a premier team a premier team is strategy. Basically it does not get any better. I chose them for this example because they are essentially god players compared to us average pub players.
Team A:
4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 4000 = Average team sp value of 4000
Team B:
4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 1800 (Nordic) = Average of 3725
Those numbers don't really matter, except that they are significantly higher than my sp value. So in an 8v8 match with only 16 players where would the other team get more players to counter balance?
I think at this point we are saying the same thing. I admit there is an "error" as you call it. I do not expect the "skill points" to equate to a persons real skill. I only expect it to be accurate relative to other players for balance purposes. I think this is totally fine as long as it balances games well most of the time, and in my experience it does a great job at that.
I do not think there is a way to determine a players "real skill" into a numeric value. I don't think it is possible with unclecrunches color system either. I don't think it is possible at all. At the risk of parroting therius, there are just too many variables that make up ns2's skill.
To me, the question is not does the skill system accurately measure skill? It is does the skill system balance games well most of the time?
There is one fundamental misunderstanding in your example; the skill number can only ever be 'perfect' within a given system. When you change the system, the value of a perfect skill score changes too. If you have a pool of players playing against each other who all have perfectly calculated skill scores (and by this I mean that their scores have converged to oscillate around the intrinsic value) and introduce a new player to the system, the system changes. After time, the scores will have adjusted so that the system predicts outcomes correctly again. All the original players will have their values changed, yes, but they are now perfect from the point of view of the new system. The score is only ever applicable in relation to others, it's not a stand-alone measure. There will be no 'error' that 'creeps' into the scores of the players. If the new player leaves and the old system emerges again, the values will start gravitating towards the values we saw before. Yes, every time the system changes, there is a period of time where the predictions are off, but this is a necessary evil that, in the big picture, has an extremely small effect due to the system being immensely large. One could argue, though, that the systems/populations in NS2 are so small that a statistical model is inconsistent, but I think they're large enough.
Edit: One more thing, you say that the skill values converge towards the real value 'somewhat asymptotically'. While this is in a way correct, it poses no problem. The skill values converge towards the intrinsic values asymptotically with oscillation. This is important. Because the skill value is just as likely to be a bit above as a bit below the intrinsic value at any given point in time, there is no preference one way or the other. The skill value is correct, on average, and with time, the oscillations become extremely small.
the skill number can only ever be 'perfect' within a given system. When you change the system, the value of a perfect skill score changes too.
Just to clarify the nomenclature: By "system" I mean the algorithm, equations, hive server and user-interfaces used, which do not change (as in the "Skill grading and balancing system"). By "system" I understood you mean "observed system state", but you tell me...
@Therius Generaly you are saying what would happen in your opinion. You are omiting why would it happen.
Also, what is this "intrinsic value", to converge to, then? Why should the "intrinsic value" change with the introduction of a stranger. Does something physically flip inside regular players mind or something? If @Nordic is right that individual SPs are meaningles and only whole set have meaning, then words like "converge" and "intrinsic value" are undefined for it.
Anyway, the point of my (admittedly not so easily understandable) example in previous post was to demonstrate that if I give the system valid initial conditions, it will spuriously introduce more error into the system, than it had previously and the errors would not have uniform(fair, from the POV of players), but normal distribution. Of course I simplified it a lot so my head would not explode. I tried to omit only things that I think either do not influence it (e.g. disbalanced games, which is claimed is handled by the system), OR morally should not influence it (e.g. players joining and leaving game in progress), but hey, it's just thought experiment done for fun, not a rigorous study. The unrealistic initial condition was chosen for the same simplicity reason, but why not - it is a valid state.
I am just posting it, because why not. This thread is not getting shorter anyway... But by all means it is not a priority to mess with the current system until we get (or are bound to get) way more players, so do not get defensive just yet. No reason to invent Skynet just yet for the three or so parallel full servers runing - unless someone wants to as a hobby (again). I don't know why it became theme lately (again)... maybe with the association to the desired game revival and getting it to top ten, where it deserves to be. :P Or maybe everyone is passionate nit-picker like me. )
The reason I'm still discussing this is that I've (admittedly very briefly) studied these things, and find it very frustrating to see some people advocating for its removal mainly because they do not understand the inner workings of the system.
Things that I'm saying (most of them at least) are not a matter of opinion, but a matter of statistics. Reading any book about statistics would lead to the same conclusions. I'm not going to start reciting text books here, however. And I'm not saying I'm perfect, I could be wrong, and moultano, for example, knows this stuff much better than I do. But I'm trying to keep the discussion as simple as possible from my part, only talking about the very fundamentals of probability, since there are a lot of people denying very obvious statistical truths. If a player has a 60% chance of winning every match he plays, he will have a 60/40 win-to-lose ratio after a large amount of matches. Saying "but what if he doesn't" really carries no value. I'm not saying the system is perfect in every sense, but people turning down the very basics behind the system I will take to court.
By intrinsic value I mean the number of the skill rating that, when used to predict match outcomes, predicts match outcomes correctly. It can change depending on who you play against, and has no absolute right value. It's the value that makes the system's prediction of match outcomes be correct on average.
For example, imagine that NS2 was launched today, and only two players started playing it. They both begin at a skill level of 1000 and only play against each other. Neither one of them is better than the other, and after a large amount of matches, they both have a W/L of 1. Since they both had the same skill level already at the beginning, the skill system already predicted their outcomes correctly (50-50), and there has been no need to adjust the values. Of course they oscillate around 1000, maybe even making their difference quite big during some periods when one of them got lucky many times in a row, but if their intrinsic probability (the actual probability that we cannot observe directly) of winning (50%) stays the same all the time, the values will always gravitate towards the same number (in this case 1000, because there is nothing making either players average score going up or down).
Now, introduce a third player into the system. This player starts at 1000 as well, but is much weaker than the two other players. The three players start playing 1vs1 matches. The original players now have a greater than 50% chance of winning a match they're given, since against the newcomer they are more likely to win. The system predicts the outcomes wrong, however, since it still thinks that every match is a 50-50 situation. The skill values of the two original players will start climbing, since they win more than half of their games and no longer offset this climb with their losses. The third player will start declining in skill value because he loses more. This will continue until a balance is found, a situation where the two original players share the same value somewhere higher, say, 1100, while the third player has a lower value, say, 900. When the skill values are such that the prediction of the system is correct, the wins start offsetting losses and vice versa, leading to the players' skill ratings oscillating around the new values. This situation could be, for example, such that when one of the original players plays against player number three, they have a 75% chance of winning, while playing against each other they still have the same 50% chance. This means that after a gazillion matches, the original players will have won 75% of games played against player number three and 50% of games against each other, and the values will not go up or down in the long run. NOTHING has changed in the ACTUAL GAMING ABILITY of the two original players, but their scores have climbed. They haven't improved in the game, but the system is still correct.
And this is not something that I'm saying SHOULD happen, this is what WILL happen according to very simple mathematical and statistical laws that you can read about in any text book or wikipedia article.
If @Nordic is right that individual SPs are meaningles and only whole set have meaning, then words like "converge" and "intrinsic value" are undefined for it.
Meaningless in which sense? I think it has been poorly defined which has led to some mass misconceptions about the skill system. "Balance points" although, not a perfect definition is closer to what the sp value represents.
The sp value does correlate with a players skill. A player with a 3000 is more than likely really really skilled at ns2. A person with a 19 and 500 hours recorded in hive is probably incredibly unskilled at ns2. I honestly can not comprehend how that player has such a low score after so many hours, but such players do exist.
A 2000 player is more likely to be skilled than an 1900 player. At the same time that 2000 player may play in a different population of players, and in actuality be less skilled than the 1900 player.
since there are a lot of people denying very obvious statistical truths.
Ugh, I always hate seeing the "adjective" truth written... :P
Well not everyone is mathematicaly inclined and not everyone can form arguments carefully and then have the tenacity to support them. Yet sometimes when you correct the mistakes, their thoughts have some worth. Of course when they say "I lost ONCE in a FET game, and I am strongg", well that's just beyond common reasoning, and you won't fix that in one forum post...
By intrinsic value I mean the number of the skill rating that, when used to predict match outcomes, predicts match outcomes correctly. It can change depending on who you play against, and has no absolute right value. It's the value that makes the system's prediction of match outcomes be correct on average.
So many problems I see in that answer:
It is circular definition. "The value is which determines outcome correctly. Predicting outcome correctly is when using the intrinsic value". Nothing can be inferred from such non-definition.
Is the "correct on average", when the SPs="intrinsic value" enough? Even random is correct on average on binary decisions (BTW what does "correct on average" even mean?)...
Besides, if it is not tied to any real property of individual, then "intrinsic" is a misnomer. If it is tied, what real property of individual changes then, depending on other players you play with? (e.g. Experienced player might slack a bit on newbie server. Mods change players skill on that given server. - but those are only two I can think of and should not influence things that much in bulk of cases)
If such value changes depending on players, is it useful to predict balance of newly created teams(=playing with different players)? "Change" to me sounds like a guarantee of bad result.
If it has no absolute value at any given time, how can it "converge" to it?
Your example is more clear.
You are pretty much describing how inflation works. You are changing the value of one unit SP, but you are not changing the unit-less intrinsic value. We know the SP for old players are alright and we know the SP of the new player is most likely horribly wrong. We could have chosen to only correct the new guys value and prevent the inflation, while not invalidating the system. Imagine 100 old players and one new one. You will be messing with their balance, just to correct the value of the single new guy. And it will all be dependent on with whom the new guy chooses to play at the moment. Imagine some complex realistic example, can you even say what will happen? But NVM. My original example was, that even when the old guys have same "intrinsic" value, they would get wildly different SP value from each other, because of the new guy. Of course I too could be wrong, because the reasoning there is not that trivial. I really should write a simulator to be sure - it's starting to be worth it, to stop all these threads. :P
And this is not something that I'm saying SHOULD happen, this is what WILL happen according to very simple mathematical and statistical laws that you can read about in any text book or wikipedia article.
I have no idea which laws you are citing specificaly, but what happens to the SP value is determined solely by the algorithm used and data it chooses to collect, laws or no laws. We could have chosen different algorithm and something else would happen with them.
Meaningless in which sense? I think it has been poorly defined which has led to some mass misconceptions about the skill system. "Balance points" although, not a perfect definition is closer to what the sp value represents.
"meaningless" for example as in "it correlates with real skill but does not represent real skill and is chosen partly arbitralily. Only its average from multiple players gains some reasonable semantics - team skill".
"Balance points"? What would that mean? How much I am one with nature? ) What about "The points that shall not be named (and don't you dare compare them to your skill)"? :P
There is very little to grab hold onto in your post. I'm not going to start arguing semantics or definitions of 'inflation', 'on average' or 'intrinsic'. Some of my words were poorly chosen, but the point is still there, and arguing about what intrinsic means is fruitless.
Yes, the skill rating is forever changing and can never be 100% correct when you introduce new players to the system or take old ones away. In a large enough system, however, these problems are so minuscule that you can't tell them apart from the normal oscillation due to lucky and unlucky days. I don't see the problem here.
And the 'law' I'm citing is that if x has a larger probability of happening than y, then x will happen more often. That is the most basic principle the system leans on. I have no problem with people criticising nuances and details of the system, but there are A LOT of people not understanding this very basic statistical fundamental (not you), trying to argue the system away with points like "but imagine a player who is AFK in every single one of his games but still his team wins every single one of those games, see how the system can be abused!"
^ If we do not agree on semantics, we can never understand each other. If you use non-standard semantics of english words, I can never understand what you mean, unless you explicitly state your new semantics (e.g. as a programmer I name my variables so their semantics matches the name I use as closely as possible, even when I could have started to name my variables something like "a", "b", "c", "d", "e", and even opposite of how I use them in the end, making job harder for anyone who wants to prove I made a mistake).
Of course my skepticism should not be used to dismantle the system (but perhaps improve it later on), because the alternative is close to 100 % chance of unbalanced games on pub (well it actually is how that works, because F4 undoes any benefit the FET does).
I've rechecked my posts and I didn't find a single word that I feel requires explanation. You blame me of a circular definition, when in truth there is none. You blame me for saying that red is green when my posts say that green is green. How do I defend myself in a situation where the only way I can explain my claims is to repeat them?
Comments
The more complicated a game is, the harder it is to quantify with stand-alone metrics like K/D, score etc. You could easily quantify a player's skill in noughts and crosses by measuring whether the player makes the only optimal move or not, since there is nothing else involved in the game. Measuring a player's tendency to win is the best metric to capture every good move a player makes in a more complicated game. You say that NS2 is so complicated that we should use individual metrics instead. You're arguing against your own point.
Probabilities only ever get values between 0 (an impossible case) and 1 (a certain case). The 'real probability' is not the outcome of the game, but the outcome's intrinsic probability, which the system is trying to estimate.
Nope, so far the bonus (or else) is received by the entire team that won or lost (minored by the time passed in the game). So it's the TEAM tendency that is rewarded (or not). It is quite different on MANY aspects and quite different on how an individual player score evolves against how it actually should evolve. In fact it isn't a measure on a single player at all.
Nope again : "capture" (sampling) cannot be interpreted based on a metaphoric number (win/loose * ratio) that is actually quite distant (the meaning) from the actual sample. How could you tell from this point if a player was a good shot (accuracy) ? And how much time he was building structures (teamwork?) ? Tendencies aren't samples. Hive is considering Win/loose. It's not the same.
I don't say it's useless, i say it's the wrong assumption. It's like saying : We can trace back the initial butterfly wing beats that are eventually responsible for the typhoon 1000km away. And we can say how many butterflies did it at what precise time and altitude. If you can do that; maybe quantum physics is one of the field you should try, the world needs it. Thumb up.
It looks like you say the win/loose is better at measuring all the different player traits because we don't have the data or we don't need it. Actually, there is plenty and maybe more (a "maybe" i can like). This data would allow a faster convergence as a player can't fake this precise things. At some point the players play...
Take a second key and play NS2. You'll see the accuracy you get won't really change but your hive score is at 0 on the first game. Other behaviors that are precisely key elements in playing NS2 properly, won't change either. Still... You will have to climb that "Skill cliff" bit by bit before FET can actually be able to process the closest picture possible of your skill.
I prefer to play NS2 well and with people who play well. Meaning i don't care about how many times they (or i) won a game. Win or loose is actually completely secondary. Maybe you never got in this kind of game in which no one cares about the Win/lose that happened. Thus it cannot be a good interpretation of their potential. Or should i say their many potentialS.
If we want to measure a player's contribution to the team's probability of winning (a goal I don't think anyone argues against), we have two options. The first one is to think about all the INFINITE amount of things that affect that contribution (the player's aim, reaction time, communication skills, teamwork abilities, map awareness, hardware, internet connection stability, amount of pets, bladder control...), decide which ones to include and how to weigh each one of them via a scoring system through an endless discussion where no one agrees with anyone else and the end product being AT BEST a biased estimate that has an INFINITE amount of things that haven't been taken into account and that needs to be rebalanced EVERY TIME something, however small, is changed within the game rules.
The second one is to go around this problem by basing the skill rating on the only thing that actually captures every single one of these infinite determinants of victory: how much the player has won in the past and against what kind of odds. This captures every determinant of victory BY DEFINITION. It's not a matter of opinion. If something a player does helps the team towards victory even in the slightest bit, it's reflected in that player's past victories ON AVERAGE. If a player is good at calling out base rushes and saves many games with those calls, he will win more games ON AVERAGE. If the game rules change in a way that make base rushes less effective, that player's contribution will be smaller in the future, since his skill of calling out the base rushes is not as useful anymore, he loses his advantage and loses more games in the future ON AVERAGE, leading to his score going down. The key word here is ON AVERAGE. You can play a perfect game and still lose, but that does not happen ON AVERAGE. Likewise, you can play a horrible game and still win, but that does not happen ON AVERAGE. ON AVERAGE these things balance each other in a way that the player's score represents his chance to win ON AVERAGE. If the player changes his way of playing, for example, starts commanding more or starts learning new lifeforms, his skill rating will still be based on his past victories that were got from his past performance where he used his past strategies, and thus it will be wrong. But this means that the player will start losing more than the system predicts ON AVERAGE, bringing his skill down to a level where he no longer loses or wins more than predicted ON AVERAGE.
I do not understand why I bother. You do not even know what the word sample means. And you call the win-to-lose ratio metaphoric. I've lost hope.
Imagine the following situation. There is a 20 slot server filled with the following player skills:
1 x 3000
3 x 2000
17 x 1000
scenario 1
1 x 3000 + 9 x 1000 -> mean skill 1200
vs.
3 x 2000 + 7 x 1000 -> mean skill 1300
leading to a skill difference of 100.
scenario 2
1 x 3000 + 1 x 2000 + 8 x 1000 -> mean skill 1300
vs.
2 x 2000 + 8 x 1000 -> mean skill 1200
leading to a skill difference of 100.
From my experience scenario 2 would be the more fair one. I played a lot of games where I (skill ~3000) was matched against some pretty decent players (skills >2000) leading to similar overall skills. Although one very good player can (especially as marine) achieve a lot in NS2, he can't be everywhere at once.
Now my question is: Does the skill system take into account any sort of standard deviation (to take into account outliners in the skill distribution) or just the pure mean skill values?
There is no need for a "infinite amount" of things. As i said winnings aren't useless. But it cannot be dumbed down to only one value. Especially one of the most unpredictable as soon as you get to higher levels.
Please seriously; you should really play NS. You assume that the NS games played are a perfect world in which every good action is rewarded on average (in the end). I'm afraid i have to say "Far from it". You should really tell that to the D1 comp. I would like to see their faces hearing that. Like "dude... what he said?". Just that; will make my day.
This system rewards the team who won, not the individuals for any specific action. This is exactly where the flaw is. The system rewards the entire team (good or bad player in it) and then FET tries to sort and assign them with their personal scores. Personal / team... Personal / team... Personal / team... Personal / team... see ?
You still avoid the big questions.
How many games do the rookies with 0 and those with 1000 have to play in order to make their Skill score go towards something close to reality ? Meaning 1/ their score will be stable (convergence); 2/ their score will be accurate so FET can use it. Considering a game will last from 10 min to 40 min (maybe more), let's say 20 minutes on average. How much time until they get there ? ... before they quit NS2 ? 100hrs? 500 ? hmmm That is something to look into isn't it.
Also you still don't answer to: What if a rookie is doing nothing (even making bad things that probably annoy the rest of the team) but still IS in the winning team. Whatever the circumstances or prediction. Do you really think this guy is rewarded to a proper value ? rewarded in the first place ? Do you really think by sticking with great players his Skill Score will go towards something accurate and useful for FET ? I mean gamers are always exploiting everything they can, right ? Just because they're rookies doesn't mean they're dumb. Even if this behavior is dumb it WILL happen as it is already.
Yet nobody including you did answer to the Gorge/Fade problem. Say FET was voted and the data is accurate in "converged wonderland" : 2x2000 + 3x1000 on both teams except one player at 500. In the case the prediction says that team A should win. In order to be in harmony with it; the Team A 2Ks should play the most aggressive role. If the 2K evolve Gorge right at the beginning they leave an empty space the opponent will surely take. leading to a lost game for sure. What happens if they decide to play gorge for several months ? eventually ruining the fun for all ?
Will they be less "skilled" ? id don't think so. What if after that, they suddenly decide to return to the fade "ripper" mode ? Does the prediction will fail all the time ? On thing is for sure FET won't have the right data about them as it can be manipulated. Because they didn't changed at all. They just made the choice to do something different. How do you solve that ?
The answer isn't : they should play like drones as anyone can do in other dumbed down games. You know "My NS is rich!".
You do not understand what statistics, probability or 'on average' means. You think we live in a world where probabilities mean nothing, and every statistical result can be rebutted by saying "but I saw this happen once so you're wrong". You probably think that scientific studies and gathering data for statistics is a useless waste of money and every statistic is a lie. This conversation is fruitless.
However, NS2 is simply far too complex to quantify skill level through individual accomplishments. The field comming example definitely applies here - There is no way to quantify someone being really good at organizing and/or leading a team - Likewise you can't punish someone for being completely uncooperative (recommendations simply do not cut it).
This approach evaluates you based on your ability to win - regardless if your ability is based on being a great fragger or being a great leader or something entirely different.
If there is an abuse in the system, the solution is not to throw out the baby with the bathwater, but to fix that gap that enables the abuse.
1) Assume all players in the system have correctly calculated SP value. And assume all games are made with FET with exactly calculated 50/50 probability.
2) New player comes into the system and is assigned 0 SP value (he is better than that in reality).
3) Round is made with calculated 50/50 probability. The real probability is let's say 49/51 (newbie in team2 gets better team by FET)
4) Lets say nobodys real skill changes. Over time Newbies SP will be pushed toward the real value, everybody elses SP will be skewed from their previously perfect value (newbies team get more than they deserve, opposing team is unfairly penalized). That is until newbies SP value converges and the real probability gets to 50/50 (well, never in perfect case, because it closes the gap somewhat asymptotically) go to step 3). It does not change the probabilities, because players are picked at random to be 50/50 calculated probability. The signed error sum of the randomly picked players will be 0, so it won't change the "real" probability.
You see, the error of one player's SP value creep to all players. In the end the error propagates as normal distribution (small number of players get overinflated SP value, some get unfairly low SP value, most get slightly more or less than they deserve), making the system a "lottery". You can get any value, given enough luck and no cheating and stat hoarding(which ia absolutely possible) is even necessary.
Given that every SP value is wrong, second problem creeps in. The system won't correct the errors even when the newbie is gone - The FET will assign them with 50/50 probability. The real probability will be 50/50 (no SP will change on ave ... I mean ON AVERAGE ), beacuse the signed average of the errors in system is 0.
Third problem creeps in and that is the player that does more than average in the team will be rewarded less than he deserved. The person that does less than average will get more. Your SPs are now dependent on where you stand in the demographic on the given server.
tl;dr
Those with the extreme trust in the math of the system should get their confidence shattered a bit.
Those hating the system should note that even in the bleak example above, even when everyone has inaccurate SPs, the FET balanced game is still even on average (half the time??), if you ignore the number as your manhood measure.
The rest of your post talks about convergence which is a known issue. I have spoken about it before, so have a few others. It is fairly obvious that convergence is too slow if you look at the "skill" distribution.
Would I be correct in saying you think the "skill system" does not work most of the time because of convergence being too slow?
I think a lot of people dislike the system because they feel it doesn't represent their skill in tracking, aiming or whatever. Since it's not designed to do that, how about renaming it to "balance points"?
When you say "correct" sp value it you are assuming there is a number that directly correlates with that player. There is no correct number. "Skill Points" is a fluid value that changes relative to who you play with. I may have about 1800 "Skill points" now but if I started playing with only premier skilled players I would be far less likely to win on average. My sp value would decrease substantially over time and become relative to whom I am playing with. If I were to start playing with only rookies my sp value would adjust to show my relative skill compared to those specific rookies.
This is not a lottery, but instead a problem of slow convergence.
If you started playing with skilled players, you would have some on your team too, not really changing your personal likeliness of win.
The "lottery" is the part, where you are assigned SP value based on pseudo-randomness(players you play at the particular moment and their respective errors in their SP values). Which in turn starts to have statistical properties of its own, that mess with the system. Of course it IS simplified example and I can't know(does anyone) to what degree and in what shape it manifests in the real system (it is beyond mere though experiment and I would have to write a complex simulation).
Slow convergence is different problem. But if you think the SP are not supposed to represent players skill, what then it is supposed to converge to in your opinion?
I don't think skill points directly represent real skill. It does certainly correlate with skill relative to the players played with, which is useful for team balance purposes.
There are problems with not enough mixing in the ns2 community. One premier level player I know of has an sp value of about 1200 which in no way is useful for balance if he were in a public game. The more public games he plays the more his sp value converges to something useful for balance.
Ok, lets say your team by having you has lower chance of winning. That is almost the same as the example in my previous post. In every round your team would lose more SPs and opposite would gain more on average (despite no change to their "real skill").
I have thought about exactly that today. After many random mixing rounds it will lead to normal distribution of the error among all the players ( few will get way less SP, most will get milder error, few will get way more SP - but average error of them all will be 0, so the system won't detect that and won't correct that.)
Of course that still leads on average to "balanced games". But is average enough? (That is maybe half games adequate and half terribly unbalanced for one or the other team. Then add people bickering about and circumventing FET and you get maybe 1/4 of good games)
Lol. I typed that out on mobile. I guess the mobile browser decided to post 3 times at once. What is really funny is I did not understand what you were talking about at first. I thought you were trying to make a point about my most recent post, which was partially right.
In my overly simplified example, there are 16 players. 15 of which are premier level players which is the highest division in ns2 competitive. It does not go higher. Div 1 teams typically have the same mechanical skill in aim and movement as premier players to the point that what makes a premier team a premier team is strategy. Basically it does not get any better. I chose them for this example because they are essentially god players compared to us average pub players.
Team A:
4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 4000 = Average team sp value of 4000
Team B:
4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 4000 + 1800 (Nordic) = Average of 3725
Those numbers don't really matter, except that they are significantly higher than my sp value. So in an 8v8 match with only 16 players where would the other team get more players to counter balance?
I think at this point we are saying the same thing. I admit there is an "error" as you call it. I do not expect the "skill points" to equate to a persons real skill. I only expect it to be accurate relative to other players for balance purposes. I think this is totally fine as long as it balances games well most of the time, and in my experience it does a great job at that.
I do not think there is a way to determine a players "real skill" into a numeric value. I don't think it is possible with unclecrunches color system either. I don't think it is possible at all. At the risk of parroting therius, there are just too many variables that make up ns2's skill.
To me, the question is not does the skill system accurately measure skill? It is does the skill system balance games well most of the time?
There is one fundamental misunderstanding in your example; the skill number can only ever be 'perfect' within a given system. When you change the system, the value of a perfect skill score changes too. If you have a pool of players playing against each other who all have perfectly calculated skill scores (and by this I mean that their scores have converged to oscillate around the intrinsic value) and introduce a new player to the system, the system changes. After time, the scores will have adjusted so that the system predicts outcomes correctly again. All the original players will have their values changed, yes, but they are now perfect from the point of view of the new system. The score is only ever applicable in relation to others, it's not a stand-alone measure. There will be no 'error' that 'creeps' into the scores of the players. If the new player leaves and the old system emerges again, the values will start gravitating towards the values we saw before. Yes, every time the system changes, there is a period of time where the predictions are off, but this is a necessary evil that, in the big picture, has an extremely small effect due to the system being immensely large. One could argue, though, that the systems/populations in NS2 are so small that a statistical model is inconsistent, but I think they're large enough.
Edit: One more thing, you say that the skill values converge towards the real value 'somewhat asymptotically'. While this is in a way correct, it poses no problem. The skill values converge towards the intrinsic values asymptotically with oscillation. This is important. Because the skill value is just as likely to be a bit above as a bit below the intrinsic value at any given point in time, there is no preference one way or the other. The skill value is correct, on average, and with time, the oscillations become extremely small.
cool.
Nifty, it always just made my score fall precipitously
Just to clarify the nomenclature: By "system" I mean the algorithm, equations, hive server and user-interfaces used, which do not change (as in the "Skill grading and balancing system"). By "system" I understood you mean "observed system state", but you tell me...
@Therius Generaly you are saying what would happen in your opinion. You are omiting why would it happen.
Also, what is this "intrinsic value", to converge to, then? Why should the "intrinsic value" change with the introduction of a stranger. Does something physically flip inside regular players mind or something? If @Nordic is right that individual SPs are meaningles and only whole set have meaning, then words like "converge" and "intrinsic value" are undefined for it.
Anyway, the point of my (admittedly not so easily understandable) example in previous post was to demonstrate that if I give the system valid initial conditions, it will spuriously introduce more error into the system, than it had previously and the errors would not have uniform(fair, from the POV of players), but normal distribution. Of course I simplified it a lot so my head would not explode. I tried to omit only things that I think either do not influence it (e.g. disbalanced games, which is claimed is handled by the system), OR morally should not influence it (e.g. players joining and leaving game in progress), but hey, it's just thought experiment done for fun, not a rigorous study. The unrealistic initial condition was chosen for the same simplicity reason, but why not - it is a valid state.
I am just posting it, because why not. This thread is not getting shorter anyway... But by all means it is not a priority to mess with the current system until we get (or are bound to get) way more players, so do not get defensive just yet. No reason to invent Skynet just yet for the three or so parallel full servers runing - unless someone wants to as a hobby (again). I don't know why it became theme lately (again)... maybe with the association to the desired game revival and getting it to top ten, where it deserves to be. :P Or maybe everyone is passionate nit-picker like me. )
Things that I'm saying (most of them at least) are not a matter of opinion, but a matter of statistics. Reading any book about statistics would lead to the same conclusions. I'm not going to start reciting text books here, however. And I'm not saying I'm perfect, I could be wrong, and moultano, for example, knows this stuff much better than I do. But I'm trying to keep the discussion as simple as possible from my part, only talking about the very fundamentals of probability, since there are a lot of people denying very obvious statistical truths. If a player has a 60% chance of winning every match he plays, he will have a 60/40 win-to-lose ratio after a large amount of matches. Saying "but what if he doesn't" really carries no value. I'm not saying the system is perfect in every sense, but people turning down the very basics behind the system I will take to court.
By intrinsic value I mean the number of the skill rating that, when used to predict match outcomes, predicts match outcomes correctly. It can change depending on who you play against, and has no absolute right value. It's the value that makes the system's prediction of match outcomes be correct on average.
For example, imagine that NS2 was launched today, and only two players started playing it. They both begin at a skill level of 1000 and only play against each other. Neither one of them is better than the other, and after a large amount of matches, they both have a W/L of 1. Since they both had the same skill level already at the beginning, the skill system already predicted their outcomes correctly (50-50), and there has been no need to adjust the values. Of course they oscillate around 1000, maybe even making their difference quite big during some periods when one of them got lucky many times in a row, but if their intrinsic probability (the actual probability that we cannot observe directly) of winning (50%) stays the same all the time, the values will always gravitate towards the same number (in this case 1000, because there is nothing making either players average score going up or down).
Now, introduce a third player into the system. This player starts at 1000 as well, but is much weaker than the two other players. The three players start playing 1vs1 matches. The original players now have a greater than 50% chance of winning a match they're given, since against the newcomer they are more likely to win. The system predicts the outcomes wrong, however, since it still thinks that every match is a 50-50 situation. The skill values of the two original players will start climbing, since they win more than half of their games and no longer offset this climb with their losses. The third player will start declining in skill value because he loses more. This will continue until a balance is found, a situation where the two original players share the same value somewhere higher, say, 1100, while the third player has a lower value, say, 900. When the skill values are such that the prediction of the system is correct, the wins start offsetting losses and vice versa, leading to the players' skill ratings oscillating around the new values. This situation could be, for example, such that when one of the original players plays against player number three, they have a 75% chance of winning, while playing against each other they still have the same 50% chance. This means that after a gazillion matches, the original players will have won 75% of games played against player number three and 50% of games against each other, and the values will not go up or down in the long run. NOTHING has changed in the ACTUAL GAMING ABILITY of the two original players, but their scores have climbed. They haven't improved in the game, but the system is still correct.
And this is not something that I'm saying SHOULD happen, this is what WILL happen according to very simple mathematical and statistical laws that you can read about in any text book or wikipedia article.
Meaningless in which sense? I think it has been poorly defined which has led to some mass misconceptions about the skill system. "Balance points" although, not a perfect definition is closer to what the sp value represents.
The sp value does correlate with a players skill. A player with a 3000 is more than likely really really skilled at ns2. A person with a 19 and 500 hours recorded in hive is probably incredibly unskilled at ns2. I honestly can not comprehend how that player has such a low score after so many hours, but such players do exist.
A 2000 player is more likely to be skilled than an 1900 player. At the same time that 2000 player may play in a different population of players, and in actuality be less skilled than the 1900 player.
Well not everyone is mathematicaly inclined and not everyone can form arguments carefully and then have the tenacity to support them. Yet sometimes when you correct the mistakes, their thoughts have some worth. Of course when they say "I lost ONCE in a FET game, and I am strongg", well that's just beyond common reasoning, and you won't fix that in one forum post...
So many problems I see in that answer:
It is circular definition. "The value is which determines outcome correctly. Predicting outcome correctly is when using the intrinsic value". Nothing can be inferred from such non-definition.
Is the "correct on average", when the SPs="intrinsic value" enough? Even random is correct on average on binary decisions (BTW what does "correct on average" even mean?)...
Besides, if it is not tied to any real property of individual, then "intrinsic" is a misnomer. If it is tied, what real property of individual changes then, depending on other players you play with? (e.g. Experienced player might slack a bit on newbie server. Mods change players skill on that given server. - but those are only two I can think of and should not influence things that much in bulk of cases)
If such value changes depending on players, is it useful to predict balance of newly created teams(=playing with different players)? "Change" to me sounds like a guarantee of bad result.
If it has no absolute value at any given time, how can it "converge" to it?
Your example is more clear.
You are pretty much describing how inflation works. You are changing the value of one unit SP, but you are not changing the unit-less intrinsic value. We know the SP for old players are alright and we know the SP of the new player is most likely horribly wrong. We could have chosen to only correct the new guys value and prevent the inflation, while not invalidating the system. Imagine 100 old players and one new one. You will be messing with their balance, just to correct the value of the single new guy. And it will all be dependent on with whom the new guy chooses to play at the moment. Imagine some complex realistic example, can you even say what will happen? But NVM. My original example was, that even when the old guys have same "intrinsic" value, they would get wildly different SP value from each other, because of the new guy. Of course I too could be wrong, because the reasoning there is not that trivial. I really should write a simulator to be sure - it's starting to be worth it, to stop all these threads. :P
I have no idea which laws you are citing specificaly, but what happens to the SP value is determined solely by the algorithm used and data it chooses to collect, laws or no laws. We could have chosen different algorithm and something else would happen with them.
"meaningless" for example as in "it correlates with real skill but does not represent real skill and is chosen partly arbitralily. Only its average from multiple players gains some reasonable semantics - team skill".
"Balance points"? What would that mean? How much I am one with nature? ) What about "The points that shall not be named (and don't you dare compare them to your skill)"? :P
Yes, the skill rating is forever changing and can never be 100% correct when you introduce new players to the system or take old ones away. In a large enough system, however, these problems are so minuscule that you can't tell them apart from the normal oscillation due to lucky and unlucky days. I don't see the problem here.
And the 'law' I'm citing is that if x has a larger probability of happening than y, then x will happen more often. That is the most basic principle the system leans on. I have no problem with people criticising nuances and details of the system, but there are A LOT of people not understanding this very basic statistical fundamental (not you), trying to argue the system away with points like "but imagine a player who is AFK in every single one of his games but still his team wins every single one of those games, see how the system can be abused!"
Of course my skepticism should not be used to dismantle the system (but perhaps improve it later on), because the alternative is close to 100 % chance of unbalanced games on pub (well it actually is how that works, because F4 undoes any benefit the FET does).