224 tech changes, part 1
matso Master of PatchesMembers, Forum Moderators, NS2 Developer, Constellation, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Reinforced - Shadow, NS2 Community Developer Join Date: 2002-11-05 Member: 7000Posts: 1,554 mod
"You don't have to be insane. But it helps".About 3 weeks ago, I had finished my last batch of performance improvements and was scanning through the latest playtest performance logs, looking for something to improve. And it was all dross - 0.5% here, another possible 0.3 percent there - so I was looking at spending days at the 0.5% improvement level, twiddling with minor tweaks here and there.
So I decided to go insane instead.
Now, that sounds a bit worse than what it actually means - it simply means picking something from my list of "stuff that would be insane to do before 1.0 release". Insane because they would introduce new architectural concepts in the engine, so its hard to figure out just how much it would destabilize everything.
However, there was this thing about movement prediction on the client that had been itching at the back of my head for a long time.
Some background info here ... the Spark Engine samples input before rendering every frame, generating a Move data structure (ie, a "move"). It adds that move to the list of moves-not-yet-part-of-the-latest-server-update, then resets the world back to the latest server update and executes all the moves, using the final state of the world render from.
Each move is quite costly to run, at about 0.5-0.7ms or so, and the length of the queue grows with effective server latency. Typically, you have maybe 100 ms net lag and 100ms interpolation lag for an effective lag of about 200ms. At 50 fps, you are looking at running a minimum of 10 prediction frames every fps (this is the "Prediction" line at the bottom of the net_stats display). If you wanted to run at 100fps, you would need to run 20 prediction frames instead - every frame. Yea, that would be 10-15ms every 10 ms. Kinda hard to do.
And that's the reason why fps goes down with latency. And why fps goes down when the server drops below 20 ticks per second - the queue gets longer. And why its so hard to increase fps on the client - faster fps means you need to predict more moves, more times.
Now, the client doesn't strictly need to do it this way - it could just take the world it has already predicted to the previous frame, add only the latest move to the world and use it to render right away. Unfortunately, 20 times per second the server sends a new update, and you would need to run all the moves from that in order to get in sync - which would cause 20 frames every second to be MUCH longer than all the other frames, resulting in some really hitchy experience. Not a good thing.
As to why the Spark Engine runs this way? Well, to quote Max: "It was not supposed to be that slow". In other words, other engines avoid similar problems by running moves really fast in hardcoded C++. In Spark, its run in Lua, allowing awesome flexibility (skulk wallwalking, jetpacks, sprinting, lerk flying - they are all coded in Lua) - at greater cost than was foreseen when the choice was made.
ENOUGH BACKGROUND .... back to the insanity.
The idea is actually quite simple. Instead of delivering a raw server snapshot to the main thread which it then has to run all the moves on, why not deliver an already predicted world to the main thread? Ie, give the snapshot to another Lua VM with almost identical code to the Client world, have it run all the moves in its own thread, and only deliver an updated world to the main thread.
That allows the main thread to just keep adding moves to its world, and now and then whenever the Prediction thread is finished preparing the server snapshot, it can just swap its state with it, at pretty much zero cost.
Nice idea. And faced with < 0.5%/day improvements, I figured I might as well give it a try - if I spent 3 days on it and it turned out to not work, three days less wasn't going to make much difference to performance anyhow.
After an intense three day hacking, the prototype was finished and worked beyond expectation. Depending on latency and how built-up the area was, the FPS increase was 30-50% extra. Some minor bugs here and there, but it was good enough to present it to UWE. When I pitched it to Brian C, I could sense his "Are this guy insane? Introducing multithreading and multiple Lua VM's less than a month before release?" - but after testing it and tasting the FPS increase, it was pretty much ... "Yea, we have to do this".
This was Monday of the 223 release. Right after the 223 release, UWE switched to iron out the bugs and unforeseen weirdness to be expected when doing something like that. It went pretty well, all things considered, and the new version was build and presented to the playtesters the following Monday.
At which time the ###### hit the server fan.
To be continued in part 2.
Member of CDT, Senior Spark Engine Hacker