GREAT NEWS! I now have hive data from as recent as 2-12-2016.
Here is the first set of graphs of many. Do note that this is from all ~90,000 profiles and some of the graphs will look a bit odd.
You may notice that the graphs look different than my previous ones. This is because I am mostly using SPSS statistics software instead of excel. I am becoming more comfortable with it and I can work much quicker with it.
Here is the second set of graphs. This time I limited the data to just players who had played from January 1, 2016 until February 12, 2016. I did a few other things from the data as I described in picture.
Is the graph of %win vs %marine created with the cleaned set?
There seems to be very few observations in the graph? 4000-5000 and not 12951?
Also it seems there are lot more observation to the left of .5 %marines, so it seems alienplayers are missing.
What are the total sum of marine hours and the total sum of aliens hours in the cleaned set?
It still seems aliens win more. Could you make a linear fit in the cleaned set?
Ideally I would like to know how how that line changed with
a) the total skill for a game.
b) the team size of a game.
I dont see how we can approximate a), but b) perhaps by assuming most player prefer some team size and most of his games are of that size.
Larger teams may mean more confrontation, leading to more kill+death per playtime. But then again, larger teams may just play larger maps.
Is the graph of %win vs %marine created with the cleaned set?
There seems to be very few observations in the graph? 4000-5000 and not 12951?
Also it seems there are lot more observation to the left of .5 %marines, so it seems alienplayers are missing.
What are the total sum of marine hours and the total sum of aliens hours in the cleaned set?
It still seems aliens win more. Could you make a linear fit in the cleaned set?
Ideally I would like to know how how that line changed with
a) the total skill for a game.
b) the team size of a game.
I dont see how we can approximate a), but b) perhaps by assuming most player prefer some team size and most of his games are of that size.
Larger teams may mean more confrontation, leading to more kill+death per playtime. But then again, larger teams may just play larger maps.
First of all when I made that graph for you, I did not have the new data yet. It was with the older data. In both new and old data there is a lot of junk data from player who had played before the hive reset but not after. To make the graph readable and have no oddball things going on I used a clean dataset from players with 100 or more hours. That left a little over 3700 players. This the same subset of data I have been using for most of my recent graphs.
Since you asked, and I now have more recent nearly complete data how about I used that. This is from the same cleaned subselection of data I just used for my recent graphs from the new data. This time I limited the data to just players who had played from January 1, 2016 until February 12, 2016. I did a few other things from the data as I described in picture. That will put N at 12951.
For a), I could try and limit the 12,951 further to certain skill range.
Here is the graph. You can see that even at this subsect of data there are those lines at about .5 and .33. Those are caused by players with not very much data in the system such as 1 win and 1 loss and then they quit ns2.
For the same subset of data, the sum of all marine time is 415,368.56 hours, the sum of alien hours is 440,029.83.
@Nordic - Love that you are using a statistics software, but can you output the graphs to excel, they are a million times more readable.... :P
I will do some excel graphs. I just want to have those out because they have most everything anyone would possibly want to know about hive stats in one page that I can quickly make.
If anyone is ever wondering why it is not good to use all of the hive data, it is because there are players like this. http://hive.naturalselection2.com/profile/112430050
It shows 10 minutes played, a 100% winrate, with a single game played that was 19 minutes long. Players like this have not played, or have not played much since the hive was reset. They are essentially garbage data.
@NovoRei, I am going to pull this conversation over here. I have new data to play with. Look up 2 posts and in the second spoiler you will see an updated correlation table.
Here is a correlation table I got out of SPSS today with all the data, no subsets. I was playing with killrates and winrates when I made this instead of W/L and K/D.
random thought after lunch based on your data:
KDrate slightly more related to WLrate than SPM. But SPM and KDrate similary related to Skill.
You can infer a high KDR player is more likely to win a match but that player will not necessarily have a high skill.
So, hypothesis:
1. are those matches netting skill points? Likely not. Is it the result of ill balances or deliberate stacking? We should look at how often and much a shuffled match net points vs non-shuffled.
2. or something else?
3. Score points for killing and destroying is well balanced because of similar correlation to Skill (wins that net skill).
That is a good question. I don't think I have a method of testing that though. I don't have individual round data.
@Nordic :
Maybe you should try to give a color to each dot (assuming it's a player) based on a scale.
Black would be a player who played the last week, and white the one that played ages ago. You may find something. I mean 90 000 profiles may not be representative when we look at NS2 Steamcharts.
Then try another one with regularity. Like player played every 2 days versus 1 month (whatever it was ages ago).
@Nordic :
Maybe you should try to give a color to each dot (assuming it's a player) based on a scale.
Black would be a player who played the last week, and white the one that played ages ago. You may find something. I mean 90 000 profiles may not be representative when we look at NS2 Steamcharts.
Then try another one with regularity. Like player played every 2 days versus 1 month (whatever it was ages ago).
This graph I think you are referring to is only from players who's last recorded match was between January 1st, 2016 and February 12th, 2016. Although it is an interesting idea, I don't see the point in making players dots different colors. I would not even know how to do that in excel.
I only have the last match they played so I can't not do a player every 2 days versus 1 month.
I should not even use all 90,000 profiles because half of them are junk because there are players like this. http://hive.naturalselection2.com/profile/112430050
It shows 10 minutes played, a 100% winrate, with a single game played that was 19 minutes long. Players like this have not played, or have not played much since the hive was reset. They are essentially garbage data.
This graph I think you are referring to is only from players who's last recorded match was between January 1st, 2016 and February 12th, 2016. Although it is an interesting idea, I don't see the point in making players dots different colors. I would not even know how to do that in excel.
You would be able to see how skill spread with time played. You may be able to see something emerge.
(if you can read this)
Is the graph of %win vs %marine created with the cleaned set?
Here is the graph. You can see that even at this subsect of data there are those lines at about .5 and .33. Those are caused by players with not very much data in the system such as 1 win and 1 loss and then they quit ns2.
For the same subset of data, the sum of all marine time is 415,368.56 hours, the sum of alien hours is 440,029.83.
Hmm thank you for the graph. I dont think it is really useful.
a) It is way steeper than I expected,
b) does not go though 0.5x0.5, but close
c) as you mentioned lot of vertical lines.
d) no marine only players?
but taken as is, it would suggest the the average player sees aliens win more than 70% of the time.
I dont believe that.
Comments
Here is the first set of graphs of many. Do note that this is from all ~90,000 profiles and some of the graphs will look a bit odd.
You may notice that the graphs look different than my previous ones. This is because I am mostly using SPSS statistics software instead of excel. I am becoming more comfortable with it and I can work much quicker with it.
Here is the second set of graphs. This time I limited the data to just players who had played from January 1, 2016 until February 12, 2016. I did a few other things from the data as I described in picture.
There seems to be very few observations in the graph? 4000-5000 and not 12951?
Also it seems there are lot more observation to the left of .5 %marines, so it seems alienplayers are missing.
What are the total sum of marine hours and the total sum of aliens hours in the cleaned set?
It still seems aliens win more. Could you make a linear fit in the cleaned set?
Ideally I would like to know how how that line changed with
a) the total skill for a game.
b) the team size of a game.
I dont see how we can approximate a), but b) perhaps by assuming most player prefer some team size and most of his games are of that size.
Larger teams may mean more confrontation, leading to more kill+death per playtime. But then again, larger teams may just play larger maps.
First of all when I made that graph for you, I did not have the new data yet. It was with the older data. In both new and old data there is a lot of junk data from player who had played before the hive reset but not after. To make the graph readable and have no oddball things going on I used a clean dataset from players with 100 or more hours. That left a little over 3700 players. This the same subset of data I have been using for most of my recent graphs.
Since you asked, and I now have more recent nearly complete data how about I used that. This is from the same cleaned subselection of data I just used for my recent graphs from the new data. This time I limited the data to just players who had played from January 1, 2016 until February 12, 2016. I did a few other things from the data as I described in picture. That will put N at 12951.
For a), I could try and limit the 12,951 further to certain skill range.
Here is the graph. You can see that even at this subsect of data there are those lines at about .5 and .33. Those are caused by players with not very much data in the system such as 1 win and 1 loss and then they quit ns2.
For the same subset of data, the sum of all marine time is 415,368.56 hours, the sum of alien hours is 440,029.83.
I will do some excel graphs. I just want to have those out because they have most everything anyone would possibly want to know about hive stats in one page that I can quickly make.
It shows 10 minutes played, a 100% winrate, with a single game played that was 19 minutes long. Players like this have not played, or have not played much since the hive was reset. They are essentially garbage data.
@NovoRei, I am going to pull this conversation over here. I have new data to play with. Look up 2 posts and in the second spoiler you will see an updated correlation table.
That is a good question. I don't think I have a method of testing that though. I don't have individual round data.
Maybe you should try to give a color to each dot (assuming it's a player) based on a scale.
Black would be a player who played the last week, and white the one that played ages ago. You may find something. I mean 90 000 profiles may not be representative when we look at NS2 Steamcharts.
Then try another one with regularity. Like player played every 2 days versus 1 month (whatever it was ages ago).
This graph I think you are referring to is only from players who's last recorded match was between January 1st, 2016 and February 12th, 2016. Although it is an interesting idea, I don't see the point in making players dots different colors. I would not even know how to do that in excel.
I only have the last match they played so I can't not do a player every 2 days versus 1 month.
I should not even use all 90,000 profiles because half of them are junk because there are players like this. http://hive.naturalselection2.com/profile/112430050
It shows 10 minutes played, a 100% winrate, with a single game played that was 19 minutes long. Players like this have not played, or have not played much since the hive was reset. They are essentially garbage data.
(if you can read this)
http://imgur.com/a/p8rky