Multithreading?
Floodinator
[HBZ] Member Join Date: 2005-02-22 Member: 42087Members, Reinforced - Shadow
<div class="IPBDescription">Will NS2 Gild support it?</div>My question is at firs can LUA handle Multithreading? And if yes enough good to gain performance?
And if both are answered with yes, will NS2 Gold support it then?
I really think why I bought a Quadcore...almost all games stick at 1 Core and DX9....damn consoles....
And if both are answered with yes, will NS2 Gold support it then?
I really think why I bought a Quadcore...almost all games stick at 1 Core and DX9....damn consoles....
Comments
That with consoles was for DX9.
I don't have a Game that needs more than 1 Core to run at min 50FPS at Full HD. Even Crysys.
Even running a NS2 dedicated Server and playing on it while recording with Fraps and Listening to Music streamed by bluethoot doesn't use more than 60% of my CPU.
But that not an answer to my question.
Yes, you can run several Lua and/or "C++" threads. Yes, you can gain performance from it. Will that automagically make NS2 faster? No. Are NS2 devs smart enough to optimize? Experience proves that yes. Can they complete it due to time constraints, tech constaints (Lua), resource constrains (2-3 coders)? I have no idea and you probably have to wait to see.
PS3 has 8 dumb floating-point cores and 1 PowerPC one. Xbox360 has 3 cores each split into 2 "threads" giving 6 "virtual cores". Being made for consoles (NS2 isn't) does in no way mean having architecture for 1 core/thread only.
Will mean better performance could be achieved by MT but it may cost too much resources so that improving the Singelthread tasks would be the better way for UWE?
Imagine you have bank account and 2 transactions are being made.
At the start bank account has $100 in it. 1st transaction is to add $10 and second transaction adds $20. What does transaction do?
Bank account is memory cell that contains a number. Transaction looks at value in the cell, writes it down and adds its amount to it, then writes it back into the cell.
1st transaction when executed alone would do:
1. Write down that $100 is in the cell.
2. Add $10 to written down value, we have $110.
3. Write $110 into the cell.
Second transaction executed alone would result in similar results except $120 in the bank account.
Fun starts when 2 transactions are interleaved. You can lose money. After executing 1st and 2nd transaction you will have less than $130. How? Imagine that:
1. 1st transaction - A writes down that account has $100
2. 2nd transaction interrupts - B writes down that account has $100
3. A - write down $110 to the memory cell
4. B is resumed - write amount there was on account that was written down + what is transferred that is $100 + $20 = $120.
5. Bank account should have $130 on it but only has $120.
That's the root of the issue. Several threads that access the same data have to execute one after another (without interleaving) resulting in speed similar to single thread doing all the work. Games have lots of points where data is read and written. If you let 2 threads do their work using code written for single thread execution, data will get corrupted (like $120 vs $130). For example data will be wrong and/or program will eventually crash. As you can see running several copies (threads) is easy, making sure they don't get in each other's way is far harder.
Why are graphics cards so fast? Because every pixel is independent from other pixels and everything can be done in parallel.
That was really informative. Too bad this forum doesn't have a reputation feature...
Tyvm!
You can cache stuff, you can predict branches etc.
Multithreading works great in certain applications, does not in others. I trust that UWE will make use of it if they consider it worth the effort/coding/potential benefits.
You can cache stuff, you can predict branches etc.
Multithreading works great in certain applications, does not in others. I trust that UWE will make use of it if they consider it worth the effort/coding/potential benefits.<!--QuoteEnd--></div><!--QuoteEEnd-->
Agreed. The explanation given above is oversimplified to the point of being wrong. Those are issues you learn to deal with in the first week of university.
Just let 2 threads add +1 to the same variable and then write the current value to console. You will note that output goes like
<div class='codetop'>CODE</div><div class='codemain' style='height:200px;white-space:pre;overflow:auto'>1 2 3 4 5 6 7 9 10 11 12 13 14 8 15 16 ....
^ 8 missing ^ there it is!</div>
Why? one thread adds +1 to seven and stores the value in a temporary cell before writing it out. However it gets interrupted by the other thread before it actually can print out the value. Actually today's processors are so fast that they usually write 100+ numbers before that happens, but the problem remains.
I'm confident that there is some way in Lua to define code blocks that are not allowed to be interrupted. (there is in Java) But as already pointed out that may eat part of the performance increase you hope to gain and defining those blocks would be a lot of extra work for the programmers.
I think a better approach would be to strictly divide functionality between threads and don't allow different threads to write in the same part of the memory, only read what others have written. Actually I'm pretty sure that the preparation the data to be rendered in the graphics card already is an own thread since it only needs to read data from the other threads.
For example if hydras are still CPU-heavy, even if they are greatly improved now, wouldn't it be a great idea to move all the AI calculations (including pathing) into an own thread. I'd imagine a hydra reading the state of the other entities (simulated by the main thread) and then calculate and write the decision "shoot spike into direction (<pitch>, <yaw>)" in another part of the memory. Then the main thread would read out the decision and simulate the results of the action. Decisions of undeployed ARCs would be something like "move forward" or "turn into direction <yaw>". If someone wants to code a more complex AI that would use more than a quarter or half of the CPU time, there even could be multiple AI threads because the AI of every entity is independent from each other.
What is still a little confusing for me is that people are only talking about multi<u>threading</u> to improve performance, not <strike>multi<u>tasking</u></strike> multi<u>processing</u>. So on a modern OS, <strike>multithreading==multitasking?</strike> multithreading==multiprocessing?
To explain the difference: You can do multithreading even on a single core processor. It will decide to give a thread some time to run, then interrupt it and run another one, changing between the threads so fast that on a human timescale it looks like all of them run simultaneously. As a matter of fact the OS does the same for different processes, ever wondered why you can run more processes than you have processor cores at the same time?
<strike>Real multitasking</strike> Symmetric multiprocessing on the other hand means that you actually run code simultaneously on different processor cores. So does a modern OS automatically detect if a process is divided into multiple threads and able to run them on different cores?
I suspect not cause e.g. what use would it be to define "don't interrupt this" code blocks if they <i>aren't</i> interrupted because the interfering thread is simply running simultaneously on another core? If you would simply interpret "don't interrupt" as "don't run <i>any</i> other code", you would effectively be back to single core performance. You would somehow need to define code blocks that are mutually exclusive to each other, so when the a core reaches the beginning of a block that doesn't run while an exclusive block is running on another core, it could continue with another thread in the meantime.
<a href="http://www.gamasutra.com/view/feature/1830/multithreaded_game_engine_.php" target="_blank">http://www.gamasutra.com/view/feature/1830...ame_engine_.php</a>
Quite accessible and gives some overview of the issue.
From a less technical perspective - for me NS2 seems to scale quite nicely with my CPU - I overclock it between 2 ghz and 3.2 ghz with very noticeable difference in framerate. With low-end GFX its pretty much a model testbed for CPU scaling. If its that dependent, it seems logical that taping second core could further increase performance. It all depends on the engine, gameplay (multithreading could introduce additional lag in certain situations), and cost-effectiveness.
Also, wulf 21, IMHO most people don't even think how multitasking works. At least since early windows days. And since Windows NT, there's quite advanced task and thread management in Windows. You just create child process and let Windows worry whether to give it another thread or make it run with other processes. Btw, thats one of the big improvements of NT 6.0.
Multithreading can't give you those kinds of performance increases, and can actually make it harder to optimize the program well due to its added complexity.
Do note that the server is already MT in some respects. Type "profile" in the console and you'll see that the client is running 2 threads, one for rendering stuff and one for updating the game state. That doesn't make the game go twice as fast though, but every little bit help.
Some people have interesting ideas when it comes to solving the problem:
<a href="http://www.st.cs.uni-sb.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf" target="_blank">http://www.st.cs.uni-sb.de/edu/seminare/20...docs/sweeny.pdf</a>
<a href="http://research.microsoft.com/apps/pubs/default.aspx?id=103220" target="_blank">http://research.microsoft.com/apps/pubs/de....aspx?id=103220</a>
<a href="http://research.microsoft.com/en-us/projects/revisions/" target="_blank">http://research.microsoft.com/en-us/projects/revisions/</a>
Exactly as I expected. The rendering thread in the CPU doesn't have much work to do, just preprocess the data for the graphics card and then tell it "Render it!". The rest of the time it's just waiting for the graphics card to finish and/or for a new game state being calculated.
But if the game comes close to being fully optimized, wouldn't it make sense then? I mean, imagine the AI using about 50% of the CPU time at this point. Then moving it to another thread could almost double the performance.
<!--quoteo(post=1856527:date=Jun 26 2011, 08:25 PM:name=IronHalik)--><div class='quotetop'>QUOTE (IronHalik @ Jun 26 2011, 08:25 PM) <a href="index.php?act=findpost&pid=1856527"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Also, wulf 21, IMHO most people don't even think how multitasking works ...<!--QuoteEnd--></div><!--QuoteEEnd-->
Arrgh, I mixed something up, shame on me! Instead of of <a href="http://en.wikipedia.org/wiki/Computer_multitasking" target="_blank">multitasking</a> what I really meant was <a href="http://en.wikipedia.org/wiki/Symmetric_multiprocessing" target="_blank">symmetric multiprocessing</a>. I'll edit that but leave the wrong stuff in there (<strike>striked through</strike>), so people can still follow the discussion.
Some IT guy I know told me that. Either he was referring to running multiple tasks in multiple cores and I misunderstood him, or he wasn't that sure in that terms either. (Multi-core CPUs were new at that time.)
But if the game comes close to being fully optimized, wouldn't it make sense then? I mean, imagine the AI using about 50% of the CPU time at this point. Then moving it to another thread could almost double the performance.<!--QuoteEnd--></div><!--QuoteEEnd-->
Yea ... but the problem is resources and time. Usually, in order of MT to scale even close to linearly, you need to design the whole architecture for it right from the beginning ... and that takes time.
And the architecture will be more complex, meaning that all coding in it will take time.
In addition, unless your MT architecture is foolproof (hah!) you will likely encounter MT-induced Heisenbugs (ie, non-repeatable bugs due to threads running things in different order), which will cause premature baldness from tearing your hair off. And take lots and lots of time to understand and fix.
Basically, if UWE had gone for a heavily MT architecture, we wouldn't be playing this game now, as it would have been too buggy, slow and unreliable. An extra year or three might have been necessary to get it to this point.
Matso is a professional and has been hacking and profiling NS2 Lua Code for quite some time now. Like he said improving the algorithms and Lua VM goes much further than adding multi-threading.
Multi-threading is expensive to code, expensive to maintain and only eats more CPU. Adding MT to LUA isn't a magic trick, it takes time not to mention all the LUA refactoring that would have to be made. I wouldn't mind that on client, but there are equally great problems on server which would expensive for server hosters to pay if it were MT.
Not to mention all crazy bugs you can get with critical areas. Race conditional bugs cause massive headaches when fixing. I wouldn't wish debugging them on my worst enemy. Anyone who has read a bit of Computer Science knows how what kind of rigorous programming is needed to write quality MT-code.
There are probably some things you can easily MT, as NS2 does now, but multi-threading the LUA game logic is whole another story.
Physics calculation and matrixes can be easily (as in, we don't want have to worry about it) MT'ed, that's what GPU does anyway.
MOOtant, your links were interesting though. I need to read them a bit first though. Not really relevant for NS2, but still interesting.
<a href="http://msdn.microsoft.com/en-us/library/ms644904(v=vs.85).aspx" target="_blank">http://msdn.microsoft.com/en-us/library/ms...4(v=vs.85).aspx</a>
MT'ing LUA would be a PITA to manage, and the most simple solution I can think of would be to change the parser to deal with the LUA scripting, rather than attempting to MT the LUA itself.