Multithreading?

Floodinator · June 2011

<div class="IPBDescription">Will NS2 Gild support it?</div>My question is at firs can LUA handle Multithreading? And if yes enough good to gain performance?
And if both are answered with yes, will NS2 Gold support it then?

I really think why I bought a Quadcore...almost all games stick at 1 Core and DX9....damn consoles....

endlesschange · June 2011

Most games i know use more then one core. I believe the 360 has 3 cores in there cpu. Your quadcore will be worth it.

Floodinator · June 2011

<div class='quotetop'>QUOTE (endlesschange @ Jun 25 2011, 03:02 PM) <a href="index.php?act=findpost&pid=1856363"><{POST_SNAPBACK}></a></div><div class='quotemain'>Most games i know use more then one core. I believe the 360 has 3 cores in there cpu. Your quadcore will be worth it.</div>

That with consoles was for DX9.

I don't have a Game that needs more than 1 Core to run at min 50FPS at Full HD. Even Crysys.
Even running a NS2 dedicated Server and playing on it while recording with Fraps and Listening to Music streamed by bluethoot doesn't use more than 60% of my CPU.

But that not an answer to my question.

MOOtant · June 2011

You seem to be asking not about use of all cores but about performance. Performance can be improved in two ways: making 1 thread faster or by spreading work to several threads (and cores).

Yes, you can run several Lua and/or "C++" threads. Yes, you can gain performance from it. Will that automagically make NS2 faster? No. Are NS2 devs smart enough to optimize? Experience proves that yes. Can they complete it due to time constraints, tech constaints (Lua), resource constrains (2-3 coders)? I have no idea and you probably have to wait to see.

PS3 has 8 dumb floating-point cores and 1 PowerPC one. Xbox360 has 3 cores each split into 2 "threads" giving 6 "virtual cores". Being made for consoles (NS2 isn't) does in no way mean having architecture for 1 core/thread only.

Floodinator · June 2011

Ty for this answer. I know that Multythreading is possible but not always will mean better performance.

Will mean better performance could be achieved by MT but it may cost too much resources so that improving the Singelthread tasks would be the better way for UWE?

MOOtant · June 2011

I don't exactly understand your last question but I can give small demonstration that might make it clearer.

Imagine you have bank account and 2 transactions are being made.

At the start bank account has $100 in it. 1st transaction is to add $10 and second transaction adds $20. What does transaction do?

Bank account is memory cell that contains a number. Transaction looks at value in the cell, writes it down and adds its amount to it, then writes it back into the cell.

1st transaction when executed alone would do:
1. Write down that $100 is in the cell.
2. Add $10 to written down value, we have $110.
3. Write $110 into the cell.

Second transaction executed alone would result in similar results except $120 in the bank account.

Fun starts when 2 transactions are interleaved. You can lose money. After executing 1st and 2nd transaction you will have less than $130. How? Imagine that:

1. 1st transaction - A writes down that account has $100
2. 2nd transaction interrupts - B writes down that account has $100
3. A - write down $110 to the memory cell
4. B is resumed - write amount there was on account that was written down + what is transferred that is $100 + $20 = $120.
5. Bank account should have $130 on it but only has $120.

That's the root of the issue. Several threads that access the same data have to execute one after another (without interleaving) resulting in speed similar to single thread doing all the work. Games have lots of points where data is read and written. If you let 2 threads do their work using code written for single thread execution, data will get corrupted (like $120 vs $130). For example data will be wrong and/or program will eventually crash. As you can see running several copies (threads) is easy, making sure they don't get in each other's way is far harder.

Why are graphics cards so fast? Because every pixel is independent from other pixels and everything can be done in parallel.

Techercizer · June 2011

<div class='quotetop'>QUOTE (MOOtant @ Jun 25 2011, 08:35 PM) <a href="index.php?act=findpost&pid=1856408"><{POST_SNAPBACK}></a></div><div class='quotemain'>explanation</div>

That was really informative. Too bad this forum doesn't have a reputation feature...

Floodinator · June 2011

Ok I see. Nice explanation!
Tyvm!

IronHalik · June 2011

Regarding the explanation given by MOOtant - there's whole science around that stuff and everybody with a computer science degree should know a couple of methods to circumvent those problems (there usually is no perfect way to do this and there are some trade offs).

You can cache stuff, you can predict branches etc.
Multithreading works great in certain applications, does not in others. I trust that UWE will make use of it if they consider it worth the effort/coding/potential benefits.

jkflipflop · June 2011

<div class='quotetop'>QUOTE (IronHalik @ Jun 26 2011, 08:03 AM) <a href="index.php?act=findpost&pid=1856490"><{POST_SNAPBACK}></a></div><div class='quotemain'>Regarding the explanation given by MOOtant - there's whole science around that stuff and everybody with a computer science degree should know a couple of methods to circumvent those problems (there usually is no perfect way to do this and there are some trade offs).

You can cache stuff, you can predict branches etc.
Multithreading works great in certain applications, does not in others. I trust that UWE will make use of it if they consider it worth the effort/coding/potential benefits.</div>

Agreed. The explanation given above is oversimplified to the point of being wrong. Those are issues you learn to deal with in the first week of university.

wulf 21 · June 2011

I know about the multithreading problems, actually that's one of the first programming examples in Java (no, that's not the same as JavaScript) when threads are introduced:

Just let 2 threads add +1 to the same variable and then write the current value to console. You will note that output goes like

<div class='codetop'>CODE</div><div class='codemain' style='height:200px;white-space:pre;overflow:auto'>1 2 3 4 5 6 7 9 10 11 12 13 14 8 15 16 ....
^ 8 missing ^ there it is!</div>

Why? one thread adds +1 to seven and stores the value in a temporary cell before writing it out. However it gets interrupted by the other thread before it actually can print out the value. Actually today's processors are so fast that they usually write 100+ numbers before that happens, but the problem remains.

I'm confident that there is some way in Lua to define code blocks that are not allowed to be interrupted. (there is in Java) But as already pointed out that may eat part of the performance increase you hope to gain and defining those blocks would be a lot of extra work for the programmers.

I think a better approach would be to strictly divide functionality between threads and don't allow different threads to write in the same part of the memory, only read what others have written. Actually I'm pretty sure that the preparation the data to be rendered in the graphics card already is an own thread since it only needs to read data from the other threads.

For example if hydras are still CPU-heavy, even if they are greatly improved now, wouldn't it be a great idea to move all the AI calculations (including pathing) into an own thread. I'd imagine a hydra reading the state of the other entities (simulated by the main thread) and then calculate and write the decision "shoot spike into direction (<pitch>, <yaw>)" in another part of the memory. Then the main thread would read out the decision and simulate the results of the action. Decisions of undeployed ARCs would be something like "move forward" or "turn into direction <yaw>". If someone wants to code a more complex AI that would use more than a quarter or half of the CPU time, there even could be multiple AI threads because the AI of every entity is independent from each other.

What is still a little confusing for me is that people are only talking about multi<u>threading</u> to improve performance, not <strike>multi<u>tasking</u></strike> multi<u>processing</u>. So on a modern OS, <strike>multithreading==multitasking?</strike> multithreading==multiprocessing?

To explain the difference: You can do multithreading even on a single core processor. It will decide to give a thread some time to run, then interrupt it and run another one, changing between the threads so fast that on a human timescale it looks like all of them run simultaneously. As a matter of fact the OS does the same for different processes, ever wondered why you can run more processes than you have processor cores at the same time?

<strike>Real multitasking</strike> Symmetric multiprocessing on the other hand means that you actually run code simultaneously on different processor cores. So does a modern OS automatically detect if a process is divided into multiple threads and able to run them on different cores?

I suspect not cause e.g. what use would it be to define "don't interrupt this" code blocks if they <i>aren't</i> interrupted because the interfering thread is simply running simultaneously on another core? If you would simply interpret "don't interrupt" as "don't run <i>any</i> other code", you would effectively be back to single core performance. You would somehow need to define code blocks that are mutually exclusive to each other, so when the a core reaches the beginning of a block that doesn't run while an exclusive block is running on another core, it could continue with another thread in the meantime.

IronHalik · June 2011

Anyone interested in the problem could look into this:

<a href="http://www.gamasutra.com/view/feature/1830/multithreaded_game_engine_.php" target="_blank">http://www.gamasutra.com/view/feature/1830...ame_engine_.php</a>

Quite accessible and gives some overview of the issue.

From a less technical perspective - for me NS2 seems to scale quite nicely with my CPU - I overclock it between 2 ghz and 3.2 ghz with very noticeable difference in framerate. With low-end GFX its pretty much a model testbed for CPU scaling. If its that dependent, it seems logical that taping second core could further increase performance. It all depends on the engine, gameplay (multithreading could introduce additional lag in certain situations), and cost-effectiveness.

Also, wulf 21, IMHO most people don't even think how multitasking works. At least since early windows days. And since Windows NT, there's quite advanced task and thread management in Windows. You just create child process and let Windows worry whether to give it another thread or make it run with other processes. Btw, thats one of the big improvements of NT 6.0.

wulf 21 · June 2011

Thank you, interesting read. So what I suggested above is actually called an asynchronous function parallel model. Didn't know that :)

matso · June 2011

Thing with multithreading is that it adds a lot to the complexity, and from a practical point of view, it doesn't add enough performance. Take NS2 for example. The current server is running rougly 10-15 times better now than it did 2 months ago, because stuff that were to slow has been rewritten to be fast. It needs to speed up its worst-case performance (server is hitting 3 ticks at the end of large games) by an additional factor of 10. Meaning that the server, when optimized, is going to perform 100 times better than it used to.

Multithreading can't give you those kinds of performance increases, and can actually make it harder to optimize the program well due to its added complexity.

Do note that the server is already MT in some respects. Type "profile" in the console and you'll see that the client is running 2 threads, one for rendering stuff and one for updating the game state. That doesn't make the game go twice as fast though, but every little bit help.

MOOtant · June 2011

Everyone who wrote a raytracer knows how to get 800x performance speedups. :P

Some people have interesting ideas when it comes to solving the problem:
<a href="http://www.st.cs.uni-sb.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf" target="_blank">http://www.st.cs.uni-sb.de/edu/seminare/20...docs/sweeny.pdf</a>
<a href="http://research.microsoft.com/apps/pubs/default.aspx?id=103220" target="_blank">http://research.microsoft.com/apps/pubs/de....aspx?id=103220</a>
<a href="http://research.microsoft.com/en-us/projects/revisions/" target="_blank">http://research.microsoft.com/en-us/projects/revisions/</a>

NurEinMensch · June 2011

Thank you, that was quite interesting. =)

wulf 21 · June 2011

<div class='quotetop'>QUOTE (matso @ Jun 26 2011, 10:47 PM) <a href="index.php?act=findpost&pid=1856547"><{POST_SNAPBACK}></a></div><div class='quotemain'>Type "profile" in the console and you'll see that the client is running 2 threads, one for rendering stuff and one for updating the game state. That doesn't make the game go twice as fast though, but every little bit help.</div>

Exactly as I expected. The rendering thread in the CPU doesn't have much work to do, just preprocess the data for the graphics card and then tell it "Render it!". The rest of the time it's just waiting for the graphics card to finish and/or for a new game state being calculated.

But if the game comes close to being fully optimized, wouldn't it make sense then? I mean, imagine the AI using about 50% of the CPU time at this point. Then moving it to another thread could almost double the performance.

<div class='quotetop'>QUOTE (IronHalik @ Jun 26 2011, 08:25 PM) <a href="index.php?act=findpost&pid=1856527"><{POST_SNAPBACK}></a></div><div class='quotemain'>Also, wulf 21, IMHO most people don't even think how multitasking works ...</div>

Arrgh, I mixed something up, shame on me! Instead of of <a href="http://en.wikipedia.org/wiki/Computer_multitasking" target="_blank">multitasking</a> what I really meant was <a href="http://en.wikipedia.org/wiki/Symmetric_multiprocessing" target="_blank">symmetric multiprocessing</a>. I'll edit that but leave the wrong stuff in there (<strike>striked through</strike>), so people can still follow the discussion.

Some IT guy I know told me that. Either he was referring to running multiple tasks in multiple cores and I misunderstood him, or he wasn't that sure in that terms either. (Multi-core CPUs were new at that time.)

matso · June 2011

<div class='quotetop'>QUOTE (wulf 21 @ Jun 27 2011, 10:04 AM) <a href="index.php?act=findpost&pid=1856660"><{POST_SNAPBACK}></a></div><div class='quotemain'>Exactly as I expected. The rendering thread in the CPU doesn't have much work to do, just preprocess the data for the graphics card and then tell it "Render it!". The rest of the time it's just waiting for the graphics card to finish and/or for a new game state being calculated.

But if the game comes close to being fully optimized, wouldn't it make sense then? I mean, imagine the AI using about 50% of the CPU time at this point. Then moving it to another thread could almost double the performance.</div>

Yea ... but the problem is resources and time. Usually, in order of MT to scale even close to linearly, you need to design the whole architecture for it right from the beginning ... and that takes time.

And the architecture will be more complex, meaning that all coding in it will take time.

In addition, unless your MT architecture is foolproof (hah!) you will likely encounter MT-induced Heisenbugs (ie, non-repeatable bugs due to threads running things in different order), which will cause premature baldness from tearing your hair off. And take lots and lots of time to understand and fix.

Basically, if UWE had gone for a heavily MT architecture, we wouldn't be playing this game now, as it would have been too buggy, slow and unreliable. An extra year or three might have been necessary to get it to this point.

Jiriki · July 2011

Asynchronous rendering (that is CPU can render frames to GPU's framebuffer) causes mouselag. Highly annoying, you can do toggle that in BFBC2.

Matso is a professional and has been hacking and profiling NS2 Lua Code for quite some time now. Like he said improving the algorithms and Lua VM goes much further than adding multi-threading.

Multi-threading is expensive to code, expensive to maintain and only eats more CPU. Adding MT to LUA isn't a magic trick, it takes time not to mention all the LUA refactoring that would have to be made. I wouldn't mind that on client, but there are equally great problems on server which would expensive for server hosters to pay if it were MT.

Not to mention all crazy bugs you can get with critical areas. Race conditional bugs cause massive headaches when fixing. I wouldn't wish debugging them on my worst enemy. Anyone who has read a bit of Computer Science knows how what kind of rigorous programming is needed to write quality MT-code.

There are probably some things you can easily MT, as NS2 does now, but multi-threading the LUA game logic is whole another story.

Physics calculation and matrixes can be easily (as in, we don't want have to worry about it) MT'ed, that's what GPU does anyway.

MOOtant, your links were interesting though. I need to read them a bit first though. Not really relevant for NS2, but still interesting.

BackSlash · July 2011

Easiest solution to MT is generally to run a single timing thread on one of the cores which all other threads refer to. That way nothing can be rendered or processed out of order. I say 'easiest', it's still a PITA to manage multiple threads and timing issues. Not to mention the fact that a lot of processors don't even time correctly (early Intel and AMD processors run inconsistent QPC's on each core, for example).

<a href="http://msdn.microsoft.com/en-us/library/ms644904(v=vs.85).aspx" target="_blank">http://msdn.microsoft.com/en-us/library/ms...4(v=vs.85).aspx</a>

MT'ing LUA would be a PITA to manage, and the most simple solution I can think of would be to change the parser to deal with the LUA scripting, rather than attempting to MT the LUA itself.

Multithreading?

Comments