lol that one guy with hive skill 1000 and a kdr of 28:1, i wonder if he played a 1v1 round and set up turrets for the whole game
Hive does not count games below a certain player count. I don't know if it is 5v5 or 6v6 though. That guy definitely has something weird going on in the data.
The fact that he has exactly 1000 skill makes me think there is something buggy going on with hive, as if he had his skill reset but retained all his stats. His 27.51 K/D doesn't mean much when he has only a 1.23 W/L though. He just doesn't make sense and that is that. I can't explain it.
"but they tend to also play with similarly skilled players, which deflates their kill and win statistics." Interesting, I don't like sticking around on Servers I am over skilled for, nor do I like playing on Servers with a much high skill average than I. Some pretty interesting psychology going on there I guess.
I guess this might be because the player count dropped as it did? I'm guessing either my personal 'Geographical Density of Games Played at Location" has tended from Oceania, to North America, to Europe, Russia and Sweden recently as player counts dropped, or I have looked for further challenges afield that I feel are achievable personally.
A lot of guessing there, maybe 1/56000 isn't a good statistic
Since there is a little more interest I have made more graphs on data from players who have less than 5 hours recorded in hive.
For thoroughness I should note again that any reference to hours recorded, or hive hours, is hours in a live game in a hive enabled server. These are not the same hours as steam. These hours are only from when a live game's timer starts until the timer stops with a team winning. Hive only counts games if there are enough players, so low player count games are not recorded.
As mentioned I started out with 59,587 players. When I cut it down to just the players with less than 5 hours recorded that left 20,378. Do note that I have no way of knowing if those players are rookies or veterans.
Most of those 20,378 players started at a 1000 hive skill when the hive system was made. Players who started after the hive skill system was made started at a hive skill value of 0 and can be assumed to be rookies. What I do know is that since hive started recording data these players have only had less than 5 hours of gameplay recorded.
This divide between is quite visible in the following graphs.
I decided to go a little further just to make better graphs. I removed all the players who had over a 900 hive skill value. The following graphs are players with less than 5 hours recorded and less than 900 hive skill. This left me with only 266 players. These players could be considered true rookies.
The graphs look much cleaner now, but they are not what is interesting here. The median K/D is 1.02 which is a lot higher than before. These players are also have a much better median W/L. Just looking at these metrics, the rookie experience does not seem as bad as I expected it to be.
Overall there is only one take home point that I see. There is a 3 hour window of recorded gameplay to prove to new players that ns2 is worth sticking with.
Why is there 0 % of players with 2 h? Especialy in the first set of graphs that are not filtered by 900 SP.
Why are people with 1000 SP considered unrookies, if that is the starting value. Besides 900 is not guarantee of being rookie (I guess I am a rookie in your books ).
EDIT: Hive seems to give steamid and steamAPI gives minutes played. So true game-time should be retrievable by anyone with some effort....
Definitely not. Alot of people quit and abandon the game before 2 hive hours because they never play on a hive server during a sale.
While it is true that a lot of players leave before 2 hours, and never play on a hive enabled server, this is a statistic that represents the population. Out of the data I have for over 59,000 players, not one has less than 2 hours of hive recorded gameplay. I am missing data. Those 59,000 players were only 77% of the players recorded in hive at the time. Even if I had all 100%, that would only be almost 1/10th of the million or more players who own ns2.
So what I do know is that players who definitely played the game, actually tried it. These are not players who have 10 minutes recorded in steam hours. These are players who have genuinely tried ns2.
Why are people with 1000 SP considered unrookies, if that is the starting value. Besides 900 is not guarantee of being rookie (I guess I am a rookie in your books ).
EDIT: Hive seems to give steamid and steamAPI gives minutes played. So true game-time should be retrievable by anyone with some effort....
To answer your edits.
They are not considered rookies in the second graph because I have no way of knowing if they are or not.
There are thousands of players who have 1000 SP. They make up 98% of all the players with less than 5 hours recorded in hive. All I can tell by the data I have is that these players have less than 5 hours of gameplay and exactly 1000 skill. I lack the context to know if they are rookies or not. In reality most of them are probably rookies but I have no way of knowing. Hive has not recorded enough data to be able to guess what they really are.
When hive was first created players started at 1000 SP. Later on it was changed so that new players started at 0 SP. Because of that there is a divide in the data. It is very visible in the following old graph from the OP.
I can know for a fact that players who started from 0 and have less than 5 hours recorded in hive are indeed true rookies, at least all the non smurfs. It can reasonably be assumed that there are not many smurfs.
Anyway, the original full % / hours graph says it all. With every doubling of the hours, there are half of such players. The sooner you impress new player to stay, the better.
Got it. You take the 0 SP group and loosers from 1000 SP group (could be resets, but well, they lost 100 SP in 5 hours).
Actually the graphs end at 300 SP not 900 SP. So I asume that filter applies to the four other graphs, or did you mistype the 900?
So I calculated the correlation coefficient of multiple variables in ns2. First read this little primer on what that means.
The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related.
If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it means that as one gets larger, the other gets smaller (often called an "inverse" correlation).
The following statistics are from players with 100 or more hours recorded in hive. The data is from March 2015. I chose to do the correlation on only players with 100 or more hours recorded in hive because each players has plenty of data.
In order from highest correlation to lowest:
W/L - Win-Loss-Ratio
K/D - Kill-Death-Ratio
Skill - Hive Skill
SPM - Score per minute
time - hive time on record
A <-> B : C - corr(A,B)=C
K/D <-> SPM : 0.718
Skill <-> W/L : 0.580
W/L <-> K/D : 0.546
Skill <-> SPM : 0.494
Skill <-> K/D : 0.521
Skill <-> time : 0.476
W/L <-> SPM : 0.434
K/D <-> time : 0.193
SPM <-> time : 0.155
W/L <-> time : 0.107
For reference for this same subset of data.
Average skill = 1297
Median Skill = 1206
Average SPM = 8.94
Median SPM = 8.86
Average WL = 1.27
Median WL = 1.19
Average KD = 1.29
Median KD = 1.17
The graphs on the left I tried to remove a lot of the noise and outliers. I removed the top and bottom 1% of players for Win/Loss, Kill/Death, and Score/minute. Most of those players were ridiculous outliers such as 27 KD or 0.1 KD. Then I removed all players with exactly 1000 skill who had less than 10 hours recorded in hive.
That left 6181 players with exactly 1000 skill.
The graph on the right is a subset of the culled data on the right. It is limited to players who had 100 or more hours recorded by hive. There are only 166 players with 1000 skill in that subset.
I see a lot of numbers/guesses/feelings of how much one fraction wins. Some say 55%/45%, some say 95% marines at 2000 skills, other again say ratio 2:1 for the aliens. Do you have any actual data of that? a graph of w/l vs marine/alien playtime? I mean if rookies prefer to play marines and aliens win in ratio of 2:1 (on what ever server they prefer to play on), then it may not be the best way of keeping them interested.
@pebl The data I have right now can not answer that. What I do have is a CSV file with ~59,000 player rows with the following columns: Hive Skill, Score/minute, Win/loss, kill/death, assits/minute, Time(h), Time (m), Alienttime(h), MarineTime(h), Hive Level, Total Score, Wins, Losses, Kills, Deaths, TotalAssists, and MostRecentMatchPlayed from March 2015.
@Brute, the creator of Wonitor server statistics mod, might be able to answer those questions better.
Hmm, you could perhaps approximate it. Do win/(losses+wins) vs marineTime/(marineTime+AlienTime). If there is no bias it should not matter how much you play marines.
I forgot to mention that this data is from March 2015, so it is not exactly relevant to today. This was mentioned in the OP along with what data I have. I also only have 77% of all players so there is some discrepancy in Win - Losses.
Still though, as requested:
win/(losses+wins) = 0.524
marineTime/(marineTime+AlienTime) = 0.491
That was done for all data I have, not for any particular segment such as 2000 or greater skill.
The scatter plot is too clouded to tell anything. The %win spread seems much larger than I expected.
I assume that the %'of how often a player plays marine' had a gaussian distribution with mean at 50%.
I.E. most players will play marines approx 40-60% of the time, and a few will play only 1 of the factions.
Secondly I assume that if there is a win bias for one fraction over the other, then there is a correlation between the %marine and %wins.
I would expect a linear correlation.
This is a lot of assumption that may not hold;
Does it take longer time in average for marines to win, as aliens rush more often in the beginning, etc.
The missing players are not missing at random selection, but I hoped that they were represented by others that looked like them.
I.e. ppl disconnects before losing, but there is no bias on whether they are on aliens or marines.
The plot is way too clouded to tell anything in the middle, and it seems strange there are 2 very clear gaps before 0% and 100% marines.
Judging from the 0% and 100% it does look like a 40/60 split to aliens.
Could you swap x and y and make the dots a lot smaller, just a dot per observation.
and perhaps just make a correlation calculation between the 2?
What is the best linear fit? Could you overlay it.
It would be interesting to see the graph with those who have played less than, say, 10 games removed, as they are likely to have an extreme distribution.
It might also be interesting to regress games won on games played as marines and hours/number of games played - or to look at the linear fit separately by number of games played (beginners/intermediate/experienced players)
You calculated the correlation of:
W/L <-> time : 0.107
Which is at first bit fun, you would expect to win more the more you play (i.e. getting better),
but if there is no influx of new players every one else also gets better.
So this probably just tell there is no influx of new players to win over.
Yeah... Too much mathematics in this thread *backs out the door slowly*
A lot of this is really above my head. I have never liked math. I prefer dealing with qualitative issues than quantitative ones. So I am like the bug that keeps flying into the bug zapper(math) and just won't stop. This thread started out with interesting and easy graphs but now I am trying to do more complicated things like linear regression and it is way over my head. I sure am learning some things really quick given this applied context.
Yeah... Too much mathematics in this thread *backs out the door slowly*
A lot of this is really above my head. I have never liked math. I prefer dealing with qualitative issues than quantitative ones. So I am like the bug that keeps flying into the bug zapper(math) and just won't stop. This thread started out with interesting and easy graphs but now I am trying to do more complicated things like linear regression and it is way over my head. I sure am learning some things really quick given this applied context.
You calculated the correlation of:
W/L <-> time : 0.107
Which is at first bit fun, you would expect to win more the more you play (i.e. getting better),
but if there is no influx of new players every one else also gets better.
So this probably just tell there is no influx of new players to win over.
There is actually quite a strong stream of new players trying ns2. It is just that most leave very quickly after trying ns2.
I have now found out that my marine time and alien time columns are not even marine time and alien time. So anything done with that data, and only that data, is completely useless. When I get around to fixing this, I will get back to you about it.
I think @Brute and @moultano should collaborate together regarding wonitor and hive 2.0 combining the two that solves our equation that no one else has been able to solve so far, just think about it
Comments
1000 skill, 100.43 hours, level 18, 75.16 alien hours, 25.29 marine hours, 52551 score, 1.23 W/L, 264 Wins, 215 Losses, 27.51 K/D, 3879 kills, 141 deaths, 4641 assists, last played 17-Jun-14
The fact that he has exactly 1000 skill makes me think there is something buggy going on with hive, as if he had his skill reset but retained all his stats. His 27.51 K/D doesn't mean much when he has only a 1.23 W/L though. He just doesn't make sense and that is that. I can't explain it.
I guess this might be because the player count dropped as it did? I'm guessing either my personal 'Geographical Density of Games Played at Location" has tended from Oceania, to North America, to Europe, Russia and Sweden recently as player counts dropped, or I have looked for further challenges afield that I feel are achievable personally.
A lot of guessing there, maybe 1/56000 isn't a good statistic
Nordic, can you e-mail me? I have some gifts for you.
charlie@unknownworlds.com
For thoroughness I should note again that any reference to hours recorded, or hive hours, is hours in a live game in a hive enabled server. These are not the same hours as steam. These hours are only from when a live game's timer starts until the timer stops with a team winning. Hive only counts games if there are enough players, so low player count games are not recorded.
As mentioned I started out with 59,587 players. When I cut it down to just the players with less than 5 hours recorded that left 20,378. Do note that I have no way of knowing if those players are rookies or veterans.
Most of those 20,378 players started at a 1000 hive skill when the hive system was made. Players who started after the hive skill system was made started at a hive skill value of 0 and can be assumed to be rookies. What I do know is that since hive started recording data these players have only had less than 5 hours of gameplay recorded.
This divide between is quite visible in the following graphs.
I decided to go a little further just to make better graphs. I removed all the players who had over a 900 hive skill value. The following graphs are players with less than 5 hours recorded and less than 900 hive skill. This left me with only 266 players. These players could be considered true rookies.
The graphs look much cleaner now, but they are not what is interesting here. The median K/D is 1.02 which is a lot higher than before. These players are also have a much better median W/L. Just looking at these metrics, the rookie experience does not seem as bad as I expected it to be.
Overall there is only one take home point that I see. There is a 3 hour window of recorded gameplay to prove to new players that ns2 is worth sticking with.
Why are people with 1000 SP considered unrookies, if that is the starting value. Besides 900 is not guarantee of being rookie (I guess I am a rookie in your books ).
EDIT: Hive seems to give steamid and steamAPI gives minutes played. So true game-time should be retrievable by anyone with some effort....
Because there are only 3 instead of thousands. It is 0.01%. I should of made that more obvious.
While it is true that a lot of players leave before 2 hours, and never play on a hive enabled server, this is a statistic that represents the population. Out of the data I have for over 59,000 players, not one has less than 2 hours of hive recorded gameplay. I am missing data. Those 59,000 players were only 77% of the players recorded in hive at the time. Even if I had all 100%, that would only be almost 1/10th of the million or more players who own ns2.
So what I do know is that players who definitely played the game, actually tried it. These are not players who have 10 minutes recorded in steam hours. These are players who have genuinely tried ns2.
To answer your edits.
They are not considered rookies in the second graph because I have no way of knowing if they are or not.
There are thousands of players who have 1000 SP. They make up 98% of all the players with less than 5 hours recorded in hive. All I can tell by the data I have is that these players have less than 5 hours of gameplay and exactly 1000 skill. I lack the context to know if they are rookies or not. In reality most of them are probably rookies but I have no way of knowing. Hive has not recorded enough data to be able to guess what they really are.
When hive was first created players started at 1000 SP. Later on it was changed so that new players started at 0 SP. Because of that there is a divide in the data. It is very visible in the following old graph from the OP.
I can know for a fact that players who started from 0 and have less than 5 hours recorded in hive are indeed true rookies, at least all the non smurfs. It can reasonably be assumed that there are not many smurfs.
Anyway, the original full % / hours graph says it all. With every doubling of the hours, there are half of such players. The sooner you impress new player to stay, the better.
Got it. You take the 0 SP group and loosers from 1000 SP group (could be resets, but well, they lost 100 SP in 5 hours).
Actually the graphs end at 300 SP not 900 SP. So I asume that filter applies to the four other graphs, or did you mistype the 900?
The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related.
If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it means that as one gets larger, the other gets smaller (often called an "inverse" correlation).
The following statistics are from players with 100 or more hours recorded in hive. The data is from March 2015. I chose to do the correlation on only players with 100 or more hours recorded in hive because each players has plenty of data.
In order from highest correlation to lowest:
For reference for this same subset of data.
Average skill = 1297
Median Skill = 1206
Average SPM = 8.94
Median SPM = 8.86
Average WL = 1.27
Median WL = 1.19
Average KD = 1.29
Median KD = 1.17
For the lols, how much do those change when you exclude the entries with exactly 1000 hive skill?
That left 6181 players with exactly 1000 skill.
The graph on the right is a subset of the culled data on the right. It is limited to players who had 100 or more hours recorded by hive. There are only 166 players with 1000 skill in that subset.
@Brute, the creator of Wonitor server statistics mod, might be able to answer those questions better.
For example you can use any of these Wonitor pages to try and answer those questions.
Wooza's servers http://apheriox.com/wonitor/
The Thirsty Onos's servers http://apheriox.com/tto/
Both of those wonitor pages have fairly ample data. Lake a look at this page from The Thirsty Onos 24p server.
http://apheriox.com/tto/configurator.html#x=winner&y=count&numPlayers_gt=12&serverId_is=crszHBfcbK
Play with the constraints and you might be able to answer some of those questions.
http://apheriox.com/tto/configurator.html#x=winner&y=count&numPlayers_gt=12&serverId_is=crszHBfcbK
Still though, as requested:
win/(losses+wins) = 0.524
marineTime/(marineTime+AlienTime) = 0.491
That was done for all data I have, not for any particular segment such as 2000 or greater skill.
I assume that the %'of how often a player plays marine' had a gaussian distribution with mean at 50%.
I.E. most players will play marines approx 40-60% of the time, and a few will play only 1 of the factions.
Secondly I assume that if there is a win bias for one fraction over the other, then there is a correlation between the %marine and %wins.
I would expect a linear correlation.
This is a lot of assumption that may not hold;
Does it take longer time in average for marines to win, as aliens rush more often in the beginning, etc.
The missing players are not missing at random selection, but I hoped that they were represented by others that looked like them.
I.e. ppl disconnects before losing, but there is no bias on whether they are on aliens or marines.
The plot is way too clouded to tell anything in the middle, and it seems strange there are 2 very clear gaps before 0% and 100% marines.
Judging from the 0% and 100% it does look like a 40/60 split to aliens.
Could you swap x and y and make the dots a lot smaller, just a dot per observation.
and perhaps just make a correlation calculation between the 2?
What is the best linear fit? Could you overlay it.
It would be interesting to see the graph with those who have played less than, say, 10 games removed, as they are likely to have an extreme distribution.
It might also be interesting to regress games won on games played as marines and hours/number of games played - or to look at the linear fit separately by number of games played (beginners/intermediate/experienced players)
You calculated the correlation of:
W/L <-> time : 0.107
Which is at first bit fun, you would expect to win more the more you play (i.e. getting better),
but if there is no influx of new players every one else also gets better.
So this probably just tell there is no influx of new players to win over.
A lot of this is really above my head. I have never liked math. I prefer dealing with qualitative issues than quantitative ones. So I am like the bug that keeps flying into the bug zapper(math) and just won't stop. This thread started out with interesting and easy graphs but now I am trying to do more complicated things like linear regression and it is way over my head. I sure am learning some things really quick given this applied context.
I think @Brute and @moultano should collaborate together regarding wonitor and hive 2.0 combining the two that solves our equation that no one else has been able to solve so far, just think about it