如何不再浪费你的CPU功率

发布时间：2014-06-10 15:33:13 Tags：CPU,GPU,Unity,四核,线程

作者：James Hicks

不管是来自《天际》还是《矮人要塞》的所有内容都会消耗我的设备，windows会提醒我CPU功率只剩11%至13%了。在一台伴随着超线程的4核装备中，这意味着事实上剩下25%左右，可能更少一点，因为每一款游戏都在消耗我的CPU，所以所有的内容都被我的CPU限制在其最大性能中——不管是帧速率还是实际的游戏速度。

同时，我的GPU几乎都是空闲的——因为游戏受限于单核处理能力，所以它的最大潜能从未被超越。

当这种情况出现时，我的CPU便是基于4.3千兆赫运行着，这是一个合理的现代英特尔设计。所以问题并不在于我的硬件。

而是软件。部分问题是源于游戏设计。它们将所有的一切内容标记成非线程安全，似乎没人想要扩展自己的引擎去克服这一问题。

但是另外一半的问题是关于游戏开发者出于某些原因拒绝正视引擎中的一些局限，或者拒绝以全新方式去使用其它适合的核装备。

我将尽可能简单地解释我们该如何做到这两点。首先，我想要创造一些不同寻常的内容。也许许多人在这时候会说：“但是James，我们使用了其它线程。”其实你并未这么做。如果你的游戏仍然陷在一核中，那么我便不会在乎你将多小的内容，如声音处理或flibbit gibbling用于另一个线程中——从根本上看你仍然面对的是一个单一的线程应用。

strange-gas-giant(from gamasutra)

在上图的右边是一个拥有程序生成纹理的气体巨星。比起捆绑一些我找不到适当的美术师去添加到《Ascent – The Space Game》中的纹理，我们为游戏中的每个行星和卫星（实际上有无数个）生成了纹理，并且是基于其它线程做到这点。当你的飞船更加靠近行星，线程将反复运行，创造出具有更高分辨率的纹理。

但对此还有一个技巧；《Ascent》是基于Unity引擎进行创造，这并不能让你在另一个线程去修改纹理。这里的技巧并不是关于修改火箭——Unity让你能够输入一组颜色到一个纹理中，而我们只是在另外一个线程上致力于我们的颜色数组。

这意味着有些时候，《Ascent》将同时使用两核，三核，甚至是四核，就像我们创造了一个带有一些卫星的巨大行星。这让我们可以无需影响帧速率而做到这点—-实际上在大多数硬件中，我们的主要线程通常是空闲的，你可以基于最大FPS，或者是有限的GPU。当我们在主要线程加载颜色数组到纹理时，你将会受到一定的影响，这是我不能避免的，除非Unity发现了线程。

我将提供另一个例子。

terrain-engine4(from gamasutra)

上图是我们关于一个行星表面的地形引擎。实际上这是一张有点过时的图片；今天的地形引擎版本更加巧妙，但是你能从中看到发生了什么。玩家位于底部中心位置，几乎就是在地面上，我们正从上方50公里处看着这一场景。因为这是地形引擎的显著代表，所以许多高速数字运算都在运行着。但非典型的情况是没有什么能够影响到帧速率（除了你撞见GPU放慢速度—-因为所有的那些三角形），因为我们是在另一个线程进行思考。

再一次地，Unity并未让你在另一个线程中改变网格，如上述我们的地形范围。这都必须在主线程中完成——但再一次地，我们将网格分解成一些构成数据结构（游戏邦注：至高点，三角形，法线和切线），在我们的其它线程中基于这些结构进行创造，并定期传回这一数据，在主要线程中基于它们去改变网格。

对此存在两个巨大的优势—-首先，比起每帧只能分割一些地形（游戏邦注：因为怕阻塞了主线程并减慢帧速率），我们可以根据自己的想法做任何我们想做的事。结果是我们的地形细节中的两大主要限制分别为GPU功率，以及Unity对于每65k网格的三角形的限制。

第二个优势，也是我认为真正有效的线程优势便是能够提供给我们真正无限制的CPU功率。我们可以考虑许多三角形，并且无需担心游戏的性能（只要我们不创造太多便可）。最终我们将拥有一个能够在球体上渲染某些地球般大小的内容的地形引擎，并且无需使用高度图作为元数据，因为它带有可行的CPU能够生成其自身的高度图。更棒的是，伴随着所有额外的CPU，这一地形引擎并不只是考虑地貌的形状，还会持续平衡行星的基本纹理与地形的细节——有时候纹理比地形更加详细，而有时候则相反。引擎拥有CPU grunt能够识别这点并通过着色器将其传递给GPU。这是关于我们如何在游戏中创造无数巨大的行星（要么是18百万字节要么是31百万字节，这也是取决于你是在哪个平台下载它的）的方法的一大部分。

还有一些较特殊的例子。我经常听到的便是关于AI。开发者总是会说他们的AI受限于CPU功率。如今少于两核的手机已经很少了。当我们着眼于AI时发现，它带有两个层面（MMO带有三层，但在这里为了更简单地说明我选择了两层）——“活跃的”AI将处理实际的移动，射击，机器人/人/坦克/船只/飞机等等的控制，然后还有“战术”或“思考”AI——即没人编码的部分，因为不存在可行的CPU功率？

combat1(from gamasutra)

想象如果存在无所事事的完整的核，只是等着开始思考面对你的AI的完整境况的话会怎样？想象如何那些核能够基于一定深度去思考每种境况，而不会影响帧速率的话会怎样？好吧，你并不用想象任何内容，这都是真实发生的境况。

暂时不考虑服务器AI，《Ascent》便是基于这种方式去管理本地客户端AI。在任何特定时刻，一艘NPC舰船将拥有一个暂时的任务，或者一个同时包含两件事的任务模式。它将始终保持移动，为自己树立一个更困难的目标，但是它的移动可能是100%在逃避，同时它将装载自己的武器或盾牌，它有可能真的逃走或者展开攻击。

如果它将展开攻击，那么它会使用怎样的武器瞄准你？如果选择了逃避，那么它会选择朝哪个方向远离你？它又会何时回来攻击你并瞄准哪里？它会在何时判断自己输掉战斗并逃离这里？它是否会为了细调控制中拥有重力锚或为了抵消其横向速度，或消灭最大加速度？好吧，这些问题都是基于战术—-如果你能够好好思考线程便能够得到答案。

每隔四分之一秒，思考线程便会与主线程进行交流，并潜在地改变其当前的互动模式，目标，以及与你的距离，它所使用或不适用的武器。然后它将回来分析任何深度的情景和我们想要看到的细节。对于未来，这意味着你可以保持加载情报到我们的本地AI并无需担心帧速率。这代表某些与我们的服务器完全不同的内容，但这并不是今天我们要讨论的话题。

这是如何影响我们的AI？再一次地让我们暂时不要考虑服务器，实际上这是为了让《Ascent》的太空战斗能够更适合人类游戏，所以我必须严格地牵制AI的舰船。这是对的，比起让我们的AI像其他人那样作弊，我必须将所有优势呈现在玩家面前。AI可以基于一小部分的速度移动并转向——在更高端的战斗中们如果你眨眼，你便会输掉比赛。我尝试着基于同样的条件与AI战斗，尽管作为开发者，我却发现很难战胜它。它的性能近乎完美。它从未犯错，它总是会在我的大脑反应过来钱便对任何情况变化做出回应。结果便是在《Ascent》中与一个NPC战斗更像是在其它游戏中与人类对手战斗。这更具有沉浸感，且具有更大的挑战性。

所以追根究底的话—-执行线程到底有多难？在我看来，如果你保持它的简单性的话倒是会很轻松。这比只是在主线程中完成所有内容复杂些，但却比将所有内容添加到主线程后尝试着优化一款复杂游戏来的简单。我想要给你们的一点建议便是，当你在执行线程时最好跳到严格的OO盒外进行思考。暂时将所有的一切当成数据和指令，如此你将获得更多直觉感—-可能因为这便是CPU核如何思考一切内容的原因。从根本上看来你正在撕掉一层抽象内容，这是有帮助的，因为搞砸线程将是一场噩梦，当你将其搞砸时，你便会希望这场噩梦能够简单些，且更短些。

我每次遵循的过程都很简单：

1.为主线程和派生线程设置数据结构进行分析。包含告诉所有人哪些内容现在与数据混合在一起的bool。当派生线程完成了自己的任务时，bool便是关于主线程如何知道是否该打开其“圣诞礼物”。

2.创造一个线程并提供给它一个使用代码。再一次地，关于大部分内容的主要功能代码似乎能够让这些内容变成更简单且更加直接，至少对于我来说是这样的。我倾向于使用低优先级，以阻止它们搞砸主线程和帧速率，即使我们需要确保它们同时运行。

3.将我们的bool设置为true

4.开始线程

5.主线程将让步，同时bool将与其相对立

6.派生线程大幅度上升（有时候会让步，但99%的情况下你是不可能放慢速度，因为有许多未使用的CPU grunt，你很快就会恢复。我知道有些人认为如果你的优先顺序设置合理的话是不需要让步的，这也是一个可爱的理论）

7.派生线程完成并将bool设置为false，并知道bool恢复后才算真正完成或让步（取决于你是否想要控制它）

8.在下一帧，主要的线程将看到boot为false，并获取其全新的处理数据

从本质上来看就是这样。在调试过程中会出现一些烦人的事——一些经典的错误代码将阻止它们脱离你的派生代码，但对此也存在一些解决方法。首先你需要确定的是不能跨越数据流——有时候将出现一个线程致力于我们的共享数据结构中。在某些安全的线程结构中你可以打破这一规则（如果你想这么做的话），但我还未找到需要这么做的情况。我们的AI代码的主线程是源自思考线程的变量，这也是我最想打破这些规则的时候。

还有其它方法能够处理线程间的交流，但我发现这一单纯的过程是最简单可靠的。在运行线程时，我希望遇到的漏洞能够尽可能的简单。

所以这是如何帮助《天际》？我并不确定，因为我从未尝试着去描述这款游戏是面向哪些内容使用其主要线程。可能其地形引擎是其中的一部分内容。也有可能《天际》规定如果AI拥有其自身的核便能够拥有更高级的功能。

另一方面，《矮人要塞》可能延伸了10个线程，并在这之间传播矮人和其他角色，从而戏剧性地加速了游戏的帧速率。

总之，我希望在未来，当玩别人的游戏时我的CPU能够超过13%。也许这一天很快就会到来。

（本文为游戏邦/gamerboom.com编译，拒绝任何不保留版权的转载，如需转载请联系：游戏邦）

Threading – USE IT. How to stop wasting most available CPU power!

by James Hicks

On my machine, everything from Skyrim to Dwarf Fortress consumes what windows tells me is 11-13% of available CPU power. On a 4 core machine with hyper threading, this means in reality that somewhere around 25%, maybe a little less, of my CPU is actually used by each game, AND that everything is limited in its maximum performance – either framerate or actual game speed – by my CPU. Or rather, the 25% or so that developers are using.

Meanwhile, my GPU is almost always partially idle – its own maximum potential never reached because the games are stuck, constrained by a single core of processing power.

When this happens, my CPU is running at 4.3Ghz and is a reasonably modern Intel design. The problem is not my hardware.

It’s the software. Part of the problem is game engines. They flag virtually everything as non thread safe, and nobody seems to be looking into expanding their engine to overcome this.

But the other half of the problem is games developers themselves refusing, for some reason, either to work around the limitations in their engines, or to use the other cores available to them in new and exciting ways.

I’m going to explain how we do both, in as close to plain english as I can. But first, I want to get something out of the way. A lot of folks might be tempted, at this point, to say “But James, we DO use other threads”. No, you don’t. If your game is still bottlenecking on one core and one core only, I don’t care what tiny little jobs like sound processing or flibbit gibbling you’ve palmed off to another thread – you’re still basically a single threaded application, sorry!

To the right of this picture is a gas giant that has a procedurally generated texture. Instead of bundling a few terabytes of textures that I can’t afford artists to make into Ascent – The Space Game (http://www.thespacegame.com), we generate textures for every planet and moon in the game – hundreds of billions of them actually – and we do it all in other threads. As your ship gets closer to a planet, the thread kicks off again and again, making higher and higher resolution textures.

But there’s a trick to this; Ascent is built on the Unity engine, which doesn’t let you modify a texture in another thread. The trick is not rocket surgery – Unity lets you import an array of colours into a texture, and we just work on our array of colours in another thread.

This means that some of the time, Ascent will be using two, three, even all four cores at once if, for example, we approach a big planet with several moons. And it lets us do this without impacting framerates – in fact on most hardware our main thread is mostly idle and you either sit at max FPS, or you’re GPU limited. You get a small impact when we load the array of colours into the texture back in the main thread, but I can’t avoid that until Unity discovers threading.

I’ll give you another example…

This is a picture of our terrain engine working away on a planet surface. Actually this picture’s a little old; today’s version of the terrain engine is smarter but you get the idea of what’s going on. The player is in the bottom centre, almost at ground level, looking ‘north’ and we’re looking at the scene from about 50 kilometers above. As is typical for a terrain engine, a lot of high speed mathematics is going on. But what’s not typical is that absolutely none of it is impacting framerates (except where you run into GPU slowness because of all those triangles) – because we do all of our thinking in another thread.

Again, Unity doesn’t let you change a mesh, such as our terrain sphere above, in another thread. It all has to be done in the main thread – but once again, we split the mesh into its component data structures (vertices, triangles, normals and tangents), do a whole bunch of stuff on these structures in our other thread, and periodically bring this data back and alter the mesh with it in the main thread.

There’s two huge advantages to this – firstly, instead of only being able to do a bit of terrain-mangling each frame, for fear of clogging up the main thread and slowing down framerates, we can do, well, pretty much whatever we like, as fast as we like. As a result, the two main limitations on our terrain’s detail are GPU power (too many triangles and the average GPU loses its lunch), and Unity’s limit on triangles per mesh of 65k or so. We could work around that second one, but there’s nothing to be done for the first but wait a few years for everybody to upgrade, and/or limit the engine based on available GPU power (which we do).

The second advantage, and the real beauty of threading in my view, is that it gives us virtually limitless CPU power. We can think about a lot of triangles there, and think a LOT about each triangle, and not worry about the game’s performance so long as we don’t make too many of them. As a result, we have a terrain engine that can render something the size of Earth, onto a sphere, without a height map as source data, because it has the available CPU to procedurally generate its own height map on the fly. Better yet, with all that extra CPU, this terrain engine isn’t just thinking about the shape of the landforms, but continually balancing between the planet’s base texture and the level of detail in the terrain – sometimes the texture is more detailed than the terrain and sometimes its the other way around. The engine has the CPU grunt to recognise this and communicate it to the GPU via shaders and their inputs. This is a big part of the recipe for how we can have hundreds of billions of crazily gargantuan planets in a game that’s either 18 or 31 megabytes, depending on which platform you download it for.

And then there’s some less conventional examples. One I keep hearing about is AI. Developers always seem to say their AI is limited by CPU power. Last time I checked though, it was 2014! Even mobile phones with less than two cores are now rare. The way I look at AI, there are two levels of it (three for an MMO but to simplify, I’ll talk about two) – “live” AI which deals with actually moving, shooting, controlling the robot/person/tank/ship/aircraft/attack chicken/you name it, and then there’s the “tactical” or “thinking” AI – you know, the part nobody codes because there’s no CPU power available?

Well, imagine if there were whole cores sitting idle, just waiting to start thinking strategically about the whole situation facing your AI? And imagine if those cores could be tasked to think about each situation in great depth, without affecting framerates! Well, you’re not imagining anything, that’s exactly the situation!

Leaving server AI out for the time being, the way Ascent manages local client AI is exactly in this way. At any given moment, an NPC ship has a current task it’s attempting to do, or rather, a task mode which is usually two things at once. It will always be moving, making itself a harder target, but its movements might be 100% evasive while its weapons or shields charge up or it flees for its life, or it could be attacking.

If it’s attacking, which weapon is it trying to point at you right now? If evading, which side does it most want facing away from you? What direction does it want to be flying off in? When should it start turning back to shoot at you and where should it be aiming? At what point does it decide its lost the battle and flee for its life? Should it have the gravity anchor on for fine tuned control or to kill its sideways velocity, or off for maximum acceleration? Well, these questions are all tactical – they get ansered by a thinking thread.

Every quarter second or so, the thinking thread talks to the main thread and potentially changes its current activity mode, targets, the distance it wants to be from you, the weapons its using or not using, you name it. Then it goes back to analysing the situation in any level of depth and detail we might want. What this means for the future is that we can keep loading intelligence into our local AI and never worry about framerates. This means something totally different on the server, but that’s not today’s discussion.

How does this impact our AI? Well, again leaving the server side out for a moment, the fact is that in order to make Ascent’s space battles even remotely playable for a human being, I have to severely hamper the AI’s ships. That’s right, instead of our AI cheating like everyone else’s, I have to give the players all the advantages. The AI can move and turn at only a fraction of the speed you can – and in the higher end battles if you blink, you lose. I tried battling the AI on equal terms myself, and despite being the actual developer I found it completely impossible to beat. Its performance was literally perfect. It never misses, and it thinks and reacts to a changing situation before my brain even sees what’s going on. As a result, fighting an NPC in Ascent is a lot more like fighting a human opponent in other games. It’s more immersive, and more challenging – for all the right reasons.

So, to the nitty gritty – how hard is it to do Threading? Well, in my opinion, it’s really easy if you keep it as simple as possible. It’s harder than just doing everything in the main thread, but it’s easier than trying to optimise a complex game after you’ve jammed everything into the main thread, so even in terms of net difficulty I’d say it’s fairly neutral. One piece of advice I would give is to think outside the strict OO box when you’re threading. Think about everything as data and instructions again for a moment, and it gets a lot more intuitive – probably because that’s exactly how CPU cores think about everything. You’re ripping off a layer of abstraction, basically, and that’s good because messed up threading can be a real nightmare, and when you mess it up, you want that nightmare to be simple – and short.

The process I follow every time is as simple as:

1.Set up data structures for the main thread and the spawned thread to share. Include one bool which tells everybody who is messing with the data right now. When the spawned thread has done its thing, that bool is how the main thread knows its safe to open its christmas presents.

2.Create a thread and give it the code to use. Again, mostly functional code for the most part seems to make this easier and more intuitive, at least to me. I tend to use low priority for threads where I can, to prevent them from messing with the main thread and framerates, even if we get a lot of them going at once.

3.Set our bool to true

4.Start the thread

5.Main thread yields while that bool is set against it

6.Spawned thread does heavy lifting (yield sometimes too, 99% of the time this wont slow you down at all because there’s so much unused CPU grunt available and you’ll be back right away. I know there’s a school of thought that you shouldn’t need to yield if your priorities are set accurately, which is a lovely theory)

7.Spawned thread finishes and sets the bool to false, thereafter completing or yielding until the bool is back (depending how you want to control it)

8.In the very next frame, the main thread sees the bool is false, and takes its exciting newly processed data

In essence, that’s it. The annoyances come during debugging – typically nice error codes don’t make their way back from your spawned thread, but there are ways around that. The main thing to make sure of is never to cross the streams… one thread at a time is working on our shared data structures. There are thread-safe structures you can break this rule with if you’re so inclined, but I’ve yet to find a need. Our AI code’s main thread reads from variables the thinky thread writes to all the time, and that’s about as close as I get to breaking these rules.

There are other ways to handle communications between the threads, but I’ve found this idiotically simple process to be the most foolproof and reliable. When threading, I like my bugs as simple as possible.

So how does this help Skyrim? I’m not sure because I have never attempted to profile what the game’s using its main thread for. Presumably the terrain engine is part of it, so there’s something. Presumably Skyrim’s limited AI could also have been made more advanced if it had its own core (or a few).

Dwarf Fortress on the other hand could probably spawn ten threads, and spread the dwarves and other characters between them, dramatically speeding up its game framerates. Hmm now I feel like playing Dwarf Fortress, so it’s probably time to wrap up.

In conclusion, I look forward to seeing my CPU go over 13% while playing someone ELSE’s game in the future. May this day come soon.(source:gamasutra)

分享到： QQ空间新浪微博开心网人人网

上一篇:游戏行业可以从Mark Pincus身上获得什么启示？