游戏邦在:
杂志专栏:
gamerboom.com订阅到鲜果订阅到抓虾google reader订阅到有道订阅到QQ邮箱订阅到帮看

阐述数据导向型游戏引擎设计的概念

发布时间:2014-06-06 16:55:17 Tags:,,,

作者:David Davidović

你可能已经听说过数据导向型游戏引擎设计,这是一个相对较新的概念,它提出了一种不同于传统的面向对象设计的想法。我将在本文解释数据导向型设计(DOD)的概念,以及为何有些游戏引擎开发者认为它可以实现极佳的效果。

历史渊源

在游戏开发的早期时代,游戏及其引擎是由旧式语言编写,例如C语言。它们属于小众产品,并且对运行缓慢的硬件构成了挑战。在多数情况下,只有少数人能够编写出单款游戏的代码,并且知道整个游戏的代码库。他们所使用的是自己得心应手的工具,C语言可以让他们最大限度地得用CPU——而这些游戏仍然在很大程度上要受制于CPU,并产生了其自身的帧缓冲器,这是极为重要的一点。

因为GPU的出现处理了三角形、纹素、象素等方面的大量工作,我们对CPU的依赖有所减少。与此同时,游戏行业也开始持续发展:越来越多人想玩更多游戏,而这也推动了更多游戏开发团队的形成。

Moore's law(from gamedevelopment)

Moore’s law(from gamedevelopment)

(摩尔定律表明,硬件是呈指数式地发展,并不与时间成线性关系)

更大的团队需要更好的协作。很快,游戏引擎就因其复杂的关卡、AI、挑选,以及渲染逻辑而要求编码员掌握更多学科的知识,他们所选择的武器就是面向对象设计。

正如Paul Graham曾经所言:

在大型公司中,软件通常是由普通程序员组成(并且频繁变化)的大型团队所编写。面向对象型编程为这些程序员施加了一套准则,避免他们当中的任何一个人造成过多损害。

无论我们是否喜欢,都要承认这个事实——更大型的公司开始制作出更大更精良的游戏,并且随着标准化工具的出现,那些制作游戏的骇客也开始成为能够被轻易更换的零件。特定的单个开发者的优点开始变得无足轻重。

面向对象设计的问题

虽然面向对象设计是个有助于开发者完成游戏等大型项目的优秀理念,但它也产生了一些抽象层次,并且要求人人都瞄准自己的层次而工作,但却无需关心其背后的执行细节,这必定会给我们带来一些麻烦。

我们看到了平行编程的爆发——编码员收集所有处理器核心来生成极快的计算速度,但与此同时,游戏场景也变得越来越复杂,如果我们想跟上这一趋势并仍然输出玩家所期待的效果,我们就必须这么做。通过使用我们手头可用的所有速度,我们开启了一扇全新的大门:例如,使用CPU时间来减少传送到GPU的数据量。

在面向对象编程中,你必须保持在一个目标中的状态,如果你想进行多线程操作,这就要求你引进像同步原语等概念。你你所调用的每个虚拟函数都有一个新的间接层次。而由以面向对象方式编写的代码所生成的内存访问模式可能很糟糕——Mike Action(Insomniac Games,前Rockstar Games成员)就曾列举这方面的例子。

美国卡内基梅隆大学教授Robert Harper也曾发表过类似观点:

面向对象编程本质上兼具反模块化和反平行化的特点,因此不适用于现代CS课程。

要如此讨论面向对象编程(OOP)是一项棘手的事情,因为OOP包含大量属性,并非人人都能对OOP达成一致意见。在这一点上,我多数时候是将OOP视为由C++所执行的设计,因为它是目前游戏引擎领域最广泛使用的语言。

我们知道游戏必须平行化,因为CPU还有大量工作潜能有待开发,并且投入等待GPU完成处理过程的周期只是浪费时间的行为。我们还知道普遍的OOP设计方法要求我们引进昂贵的锁竞争,与此同时,这可能违反缓存局部性,或者在最出乎意料的情况下产生不必要的分支(这可能极耗成本)。

utilization(from gamedevelopment)

utilization(from gamedevelopment)

(如果我们不能利用多个核心,我们可以持续使用相同数量的CPU资源,即便硬件获得了提升(拥有更多核心)。同时我们可以推动GPU到达其极限,因为它是平行设计,能够同时处理任意数量的工作。这可能会影响我们为玩家提供其硬件上的最佳体验的目标,因为我们没有发挥其所有潜能。)

这就产生了一个问题:我们是否应该重新考虑我们的范例?

数据导向型设计

这种方法论的一些推崇者将其称为数据导向型设计,但事实上人们在更早前就获知了其普遍概念。其基本前提很简单:围绕数据结构创建代码,并描述你针对这些结构操作的目标。

我们之前听过这类说法:Linus Torvalds(Linux和Git开发者)曾在一篇博文中指出,他是“围绕数据设计代码而非根据代码设计数据”这一方法的超级拥护者,并将其称为Git获得成功的原因之一。他甚至声称优秀程序员与糟糕程序员的区别就在于他们考虑的是数据结构还是代码本身。

乍一看这似乎有违常理,因为它要求你颠倒自己的思维模式。但不妨这样想:一款游戏在运行的时候,会捕获所有的用户输入方式,以及它所有不依赖外部因素(例如网络或IPC)的高表现性能环节。游戏会消耗用户事件(鼠标移动,摁压摇杆按钮等)以及当前游戏状态,并且会把这些混合成新的数据集合——例如,发送到GPU的批量数据,发送到音频卡的PCM样本,以及一个新游戏状态。

这种“数据流失”可以划分成更多子过程。动画系统需要采用下一关键帧数据和当前状态,并生成一个新状态。粒子系统会采用它的当前状态(粒子定位、速度等)以及一个时间进展以生成一个新状态。挑选算法会采用一系列备选可渲染集合并生成更小的可渲染集合。游戏引擎中几乎一切都可以视为操纵数据块来生成另一数据块的过程。

处理器对应的是本地相关性和缓存实用性。所以,在数据导向型设计中,我们一般会将一切整合在大型而同类阵列中,并且在任何可行的平台运行优秀、缓存一致而强大的算法,以取代一个可能看似更好的算法(它可能具有成本优势,但却无法容纳其运行硬件的框架局限性)。

objects(from gamedevelopment)

objects(from gamedevelopment)

(当我们处理对象时,必须将它们视为“黑匣子”并调用其方法,这反过来会访问数据,并让我们得到自己所需的东西,或者进行我们所需的变化。这有利于维护,但不了解我们的数据布局却可能极大损害表现性能。)

例子

数据导向型设计要求我们考虑所有数据,所以我们要做一些不同寻常的操作。先看以下代码:

void MyEngine::queueRenderables()
{
for (auto it = mRenderables.begin(); it != mRenderables.end(); ++it) {
if ((*it)->isVisible()) {
queueRenderable(*it);

虽然它已经进行了极大简化,但却是面向对象游戏引擎中的常见模式。但是,如果大量可渲染对象并不可视时,我们就会遇到大量分支误预测了导致处理器废弃其执行的一些指令以便采用特定分支的原因。

对于较小的场景来说,这并不是什么问题。但想想你究竟有多少次遇到这种情况吧,不只是在排列可渲染对象时,在迭代场景照明,阴影地图、区域等时候不也是这样吗?那么AI或动画更新的时候呢?将你在整个过程中所遇到的次数相加起来,看看你究竟排除了多少时脉周期,计算你的处理器有多少次能够以稳定的120FPS速度传送所有的GPU批次,这样你就知道这些情况究竟有多棘手。

如果一名制作网络应用的程序员认为这不过是极小的微型优化那就很有趣了,但我们都知道游戏是即时系统,其资源限制极为严格,所以这种考虑很有必要。

为避免发生这种情况,让我们用另一种方式来看待:如果我们保留引擎中的可视可渲染对象列表会怎样?当然,我们会牺牲掉一个myRenerable->hide()句法,并违反相当数量的OOP准则,但我们之后还可以这样做:

void MyEngine::queueRenderables()
{
for (auto it = mVisibleRenderables.begin(); it != mVisibleRenderables.end(); ++it) {
queueRenderable(*it);
}
}

这里没有分支误预测,假设mVisibleRenderables是一个很好的std::vector(这是一个邻接阵列),我们可以像memcpy调用一样对其进行重写。

这里的代码进行了大量简化。但说实话,我们所讨论的还只是一点皮毛。考虑数据结构及其关系可以让我们看到更多之前从未想过的可能。

平行化和向量化

如果我们拥有简单、定义明确且能像基础创建模块那样运行大量数据块的函数,那就很容易生成4个、8个或者16个工作线程,并为每一者都分配一个数据片段,从而令CPU核心保持忙碌状态。不需要互拆器、原子或锁竞争,当你需要数据时,你只需要加入所有线程并等待它们完工即可。如果你需要平行分类数据(这是我们准备发送到CPU的东西之时会频繁遇到的任务),你必须从不同角度来考虑这一点。

在一个线程内,你可以使用SIMD向量指令(例如SSE/SSE2/SSE3) 来实现一个额外的加速。有时候,你可以通过以不同方式布置数据来完成,例如以一个阵列结构(SoA)的方式(例如XXX…YYY…ZZZ…) 而非传统的结构阵列(AoS)方式(XYZXYZXYZ…)来布置向量阵列。

parallel(from gamedevelopment)

parallel(from gamedevelopment)

(当我们的算法直接处理数据时,它在平行化时就会变得很琐碎,我们也可以避免一些速度上的瑕疵)

单元测试没有外部影响的简单函数很容易进行单元测试。针对你需要反复切换的算法,用回复测试的形式效果更好。

例如,你可以针对一个挑选算法行为创建一个测试组件,设置一个精心安排的环境,并衡量它的执行方式。当你设计一个新挑选算法时,你就要再次运行相同的测试。你衡量性能表现和正确性,这样就可以信手拈来进行评估了。

你越深入研究数据导向型设计方法,就会发现自己越容易测试自己的游戏引擎。

以整体数据结合类和对象

数据导向型设计绝非面象编程的对立面,只是其中的一些理念。因此,你可以用来自数据导向型设计的理念,并仍然使用你所习惯的多数抽象和心智模式。

例如,Matias Goldberg(OGRE 2.0版本开发者)就选择将数据存储在巨大的同类阵列中,并使用迭代整个阵列的函数而非仅迭代单个数据的函数,以便加快Ogre。根据一项基准显示,它现在的运行速度加快了3倍。不仅如此,他还保留了大量自己熟悉的旧类抽象,所以其API并没有经过彻底的重写。

总结

有大量证据显示这种方法可以创造游戏引擎。

Molecule Engine开发博客就有一系列标题为《Adventures in Data-Oriented Design》文章包含大量与DOD成功运用的范例。

DICE似乎也对数据导向型设计颇有好感,他们在Frostbite Engine的挑选系统中采用了这一方法。

除此之外,之前提到的开发者Mike Acton也似乎接纳了这一理念。有一些基准显示这一方法的确极大提升了性能表现,但我还没有看到数据导向型设计的活跃运用。当然,它也可能只是昙花一现,但其主要前提却似乎极为合乎逻辑。这当然与行业的惰性有关(其他软件开发领域亦是如此),所以这也可能会阻碍这一方法的大规模运用。(本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦

What is Data-Oriented Game Engine Design?

by David Davidović

You may have heard of data-oriented game engine design, a relatively new concept that proposes a different mindset to the more traditional object-oriented design. In this article, I’ll explain what DOD is all about, and why some game engine developers feel it could be the ticket for spectacular performance gains.

A Bit of History

In the early years of game development, games and their engines were written in old-school languages, such as C. They were a niche product, and squeezing every last clock cycle out of slow hardware was, at the time, the utmost priority. In most cases, there was only a modest number of people hacking at the code of a single title, and they knew the entire codebase by heart. The tools they were using had been serving them well, and C was providing the performance benefits that enabled them to push the most out of the CPU—and as these games were still by large bound by the CPU, drawing to its own frame buffers, this was a very important point.

With the advent of GPUs that do the number-crunching work on the triangles, texels, pixels, and so on, we’ve come to depend less on the CPU. At the same time, the gaming industry has seen steady growth: more and more people want to play more and more games, and this in turn has led to more and more teams coming together to develop them.
Moore’s law shows that hardware growth is exponential, not linear in respect to time: this means that every couple of years, the number of transistors we can fit on a single board doesn’t change by a constant amount—it doubles!

Bigger teams needed better cooperation. Before long, the game engines, with their complex level, AI, culling, and rendering logic required the coders to be more disciplined, and their weapon of choice was object-oriented design.

As Paul Graham once said:

At big companies, software tends to be written by large (and frequently changing) teams of mediocre programmers. Object-oriented programming imposes a discipline on these programmers that prevents any one of them from doing too much damage.

Whether we like it or not, this has to be true to some degree—bigger companies started to deploy bigger and better games, and as the standardization of tools emerged, the hackers working on games became parts that could be swapped out way more easily. The virtue of a particular hacker became less and less important.

Problems With Object-Oriented Design

While object-oriented design is a nice concept that helps developers on big projects, such as games, create several layers of abstraction and have everyone work on their target layer, without having to care about the implementation details of the ones underneath, it’s bound to give us some headaches.

We see an explosion of parallel programming—coders harvesting all the processor cores available to deliver blazing computation speeds—but at the same time, game scenery becomes more and more complex, and if we want to keep up with that trend and still deliver the frames-per-second our players expect, we need to do it, too. By using all speed we have at hand, we can open doors for entirely new possibilities: using the CPU time to reduce the number of data sent to the GPU altogether, for example.

In object-oriented programming, you keep state within an object, which requires you to introduce concepts like synchronization primitives if you want to work on it from multiple threads. You have one new level of indirection for every virtual function call you make. And the memory access patterns generated by code written in an object-oriented manner can be awful—in fact, Mike Acton (Insomniac Games, ex-Rockstar Games) has a great set of slides casually explaining one example.

Similarly, Robert Harper, a professor at Carnegie Mellon University, put it this way:

Object-oriented programming is [...] both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum.

Talking about OOP like this is tricky, because OOP encompasses a huge spectrum of properties, and not everyone agrees what OOP means. In this sense, I’m mostly talking about OOP as implemented by C++, because that’s currently the language that vastly dominates the game engine world.

So, we know that games need to become parallel because there is always more work that the CPU can (but doesn’t have to) do, and spending cycles waiting for the GPU to finish processing is just wasteful. We also know that common OO design approaches require us to introduce expensive lock contention, and at the same time, can violate cache locality or cause unnecessary branching (which can be costly!) in the most unexpected circumstances.
If we don’t take advantage of multiple cores, we keep using the same amount of CPU resources even if the hardware gets arbitrarily better (has more cores). At the same time, we can push GPU to its limits because it is, by design, parallel, and able to take on any amount of work simultaneously. This can interfere with our mission to provide players the best experience on their hardware, as we are clearly not using it to full potential.

This raises the question: should we rethink our paradigms altogether?

Enter: Data-Oriented Design

Some proponents of this methodology have called it data-oriented design, but the truth is that the general concept has been known for much longer. Its basic premise is simple: construct your code around the data structures, and describe what you want to achieve in terms of manipulations of these structures.

We’ve heard this kind of talk before: Linus Torvalds, the creator of Linux and Git, said in a Git mailing list post that he is a huge proponent of “designing the code around the data, not the other way around”, and credits this as one of the reasons for Git’s success. He goes on even to claim that the difference between a good programmer and a bad one is whether she worries about data structures, or the code itself.

The task may seem counterintuitive at first, because it requires you to turn your mental model upside-down. But think of it this way: a game, while running, captures all the user’s input, and all performance-heavy pieces of it (the ones where it would make sense to ditch the standard everything is an object philosophy) do not rely on outside factors, such as network or IPC. For all you know, a game consumes user events (mouse moved, joystick button pressed, and so on) and the current game state, and churns these up into a new set of data—for example, batches that are sent to the GPU, PCM samples that are sent to the audio card, and a new game state.

This “data churning” can be broken down into a lot more sub-processes. An animation system takes the next keyframe data and the current state and produces a new state. A particle system takes its current state (particle positions, velocities, and so on) and a time advancement and produces a new state. A culling algorithm takes a set of candidate renderables and produces a smaller set of renderables. Almost everything in a game engine can be thought of as manipulating a chunk of data to produce another chunk of data.

Processors love locality of reference and utilization of cache. So, in data-oriented design, we tend to, wherever possible, organize everything in big, homogenous arrays, and, also wherever possible, run good, cache-coherent brute-force algorithms in place of a potentially fancier one (which has a better Big O cost, but fails to embrace the architecture limitations of hardware it works on).

When performed per-frame (or multiple times per frame), this potentially gives huge performance rewards. For example, the folks at Scalyr report searching log files at 20GB/sec using a carefully-crafted but a naive sounding brute-force linear scan.
When we process objects, we have to think of them as “black boxes” and call their methods, which in turn access the data and get us what we want (or make changes that we want). This is great for working for maintainability, but not knowing how our data is laid out can be detrimental to performance.

Examples

Data-oriented design has us thinking all about data, so let’s do something also a bit different from what we usually do. Consider this piece of code:

void MyEngine::queueRenderables()
{
for (auto it = mRenderables.begin(); it != mRenderables.end(); ++it) {
if ((*it)->isVisible()) {
queueRenderable(*it);
}
}

Although simplified a lot, this common pattern is what is often seen in object-oriented game engines. But wait—if a lot of renderables aren’t actually visible, we run into a lot of branch mispredictions which cause the processor to trash some instructions that it had executed in hope a particular branch was taken.

For small scenes, this obviously isn’t an issue. But how many times do you do this particular thing, not just when queuing renderables, but when iterating through scene lights, shadow map splits, zones, or the like? How about AI or animation updates? Multiply all that you do throughout the scene, see how many clock cycles you expel, compute how much time your processor has available to deliver all the GPU batches for a steady 120FPS rhythm, and you see that these things can scale to a considerable amount.

It would be funny if, for instance, a hacker working on a web app even considered such miniscule micro-optimizations, but we know that games are real-time systems where resource constraints are incredibly tight, so this consideration is not misplaced for us.

To avoid this from happening, let’s think about it in another way: what if we kept the list of visible renderables in the engine? Sure, we would sacrifice the neat syntax of myRenerable->hide() and violate quite a few OOP principles, but we could then do this:

void MyEngine::queueRenderables()
{
for (auto it = mVisibleRenderables.begin(); it != mVisibleRenderables.end(); ++it) {
queueRenderable(*it);
}
}

Hooray! No branch mispredictions, and assuming mVisibleRenderables is a nice std::vector (which is a contiguous array), we could have as well rewritten this as a fast memcpy call (with a few extra updates to our data structures, probably).

Now, you may call me out on the sheer cheesiness of these code samples and you will be quite right: this is simplified a lot. But to be honest, I haven’t even scratched the surface yet. Thinking about data structures and their relationships opens us to a whole lot of possibilities we haven’t thought about before. Let’s look at some of them next.

Parallelization and Vectorization

If we have simple, well-defined functions that operate on large data chunks as base building blocks for our processing, it’s easy to spawn four, or eight, or 16 worker threads and give each of them a piece of data to keep all the CPU cores busy. No mutexes, atomics or lock contention, and once you need the data, you need only to join on all the threads and wait for them to finish. If you need to sort data in parallel (a very frequent task when preparing stuff to be sent to the GPU), you have to think about this from a different perspective—these slides might help.

As an added bonus, inside one thread you can use SIMD vector instructions (such as SSE/SSE2/SSE3) to achieve an additional speed boost. Sometimes, you can accomplish this only by laying your data in a different way, such as placing vector arrays in a structure-of-arrays (SoA) manner (like XXX…YYY…ZZZ…) rather than the conventional array-of-structures (AoS; that would be XYZXYZXYZ…). I’m barely scratching the surface here; you can find more information in the Further Reading section below.
When our algorithms deal with the data directly, it becomes trivial to parallelize them, and we can also avoid some speed drawbacks.

Unit Testing You Didn’t Know Was Possible

Having simple functions with no external effects makes them easy to unit-test. This can be especially good in a form of regression testing for algorithms you’d like to swap in and out easily.

For example, you can build a test suite for a culling algorithm’s behavior, set up an orchestrated environment, and measure exactly how it performs. When you devise a new culling algorithm, you run the same test again with no changes. You measure performance and correctness, so you can have assessment at your fingertips.

As you get more into the data-oriented design approaches, you’ll find it easier and easier to test aspects of your game engine.

Combining Classes and Objects With Monolithic Data

Data-oriented design is by no means opposed to object-oriented programming, just some of its ideas. As a result, you can quite neatly use ideas from data-oriented design and still get most of the abstractions and mental models you’re used to.

Take a look, for example, at the work on OGRE version 2.0: Matias Goldberg, the mastermind behind that endeavor, chose to store data in big, homogenous arrays, and have functions that iterate over whole arrays as opposed to working on only one datum, in order to speed up Ogre. According to a benchmark (which he admits is very unfair, but the performance advantage measured cannot be only because of that) it works now three times faster. Not only that—he retained a lot of the old, familiar class abstractions, so the API was far from a complete rewrite.
Advertisement

Is It Practical?

There is a lot of evidence that game engines in this manner can and will be developed.

The development blog of Molecule Engine has a series named Adventures in Data-Oriented Design, and contains a lot of useful advice regarding where DOD was put to use with great results.

DICE seems to be interested in data-oriented design, as they have employed it in Frostbite Engine’s culling system (and got significant, speed-ups, too!). Some other slides from them also include employing data-oriented design in the AI subsystem—worth looking at, too.

Besides that, developers like the aforementioned Mike Acton seem to be embracing the concept. There are a few benchmarks which prove that it does gain a lot in performance, but I haven’t seen a lot of activity on the data-oriented design front in quite some time. It could, of course, be just a fad, but its main premises seem very logical. There sure is a lot of inertia in this business (and any other software development business, for that matter) so this may be hindering large-scale adoption of such a philosophy. Or maybe it’s not such a great idea as it seems to be. What do you think? Comments are very welcome!(source:gamedevelopment


上一篇:

下一篇: