游戏邦在:
杂志专栏:
gamerboom.com订阅到鲜果订阅到抓虾google reader订阅到有道订阅到QQ邮箱订阅到帮看

深入探析解决手机游戏性能问题的方法

发布时间:2012-11-08 17:12:55 Tags:,,,,

作者:Itay Duvdevani

面向MoMinis Platform的手机领域开发游戏非常有趣。因为我们的内容开发者主要是基于模拟器而处理游戏机制,漏洞以及性能调整问题,所以当我们真正面向手机进行游戏测试时总是会感到紧张。

编译游戏将会揭示许多可能破坏游戏逻辑的漏洞(不管是在模拟器还是编译器上),或呈现出一些未能在模拟器中发挥功效的性能问题。

当一款游戏呈现在模拟器中而非手机上时,我们的团队总是能够快速明确问题所在——因为作为程序编译开发者,我们是唯一能够指出问题所在的人。

《企鹅跳跃》所面对的问题

Jelly Jump(from heyzap.com)

Jelly Jump(from heyzap.com)

早前我们的内容开发者一直致力于新游戏《企鹅跳跃》的开发。在经历了几周的开发时间,我们决定在发行前1周左右开始在真正的手机上测试这款游戏。面向高端的手机,我们的游戏不仅能够顺畅地运行,并且从整体看来仍然非常有趣。但是在中端和低端设备中就不是如此了,随机帧率的变化让我们的游戏难以正常运转。所以我们的团队便开始想办法处理这一情况。

我们选择了一种较为强大的设备(带有Adreno 200 GPU的800赫兹的Snapdragon)去研究其中所存在的问题。这并不是关于低帧率的问题,而是隔几秒便出现一次帧率变化的问题(看似随机事件)。即游戏一开始是基于40至50的帧率,几秒后却下降到10至5帧率,并在之后又出现不同的变化。

内容开发者告知我们一些测试结果,而删除图像资产的决定让游戏在最终运行时具有更好的表现。

在编译时期,资产生成阶段将负责把游戏资产传输到纹理集中。因为这是一个自动化过程,所以该阶段将使用一些试探法去预测哪些对象将被聚集在一起,并尽量避免纹理变动。这种试探法非常基础,只依赖于静态分析——因为我们并未拥有性能指导优化工具。我们怀疑是低效的资产分配,以及纹理坍塌(因为每次出现特定对象时总是会耗尽内存)导致纹理出现不断的变化。

这是问题所在。但是我们却找不到任何“神奇的解决方法”或权变措施——甚至想不到任何应急方案。从而导致在资产生成传输过程中堆积了过多的游戏图像。

在投入了几天的时间去寻找问题的解决方法但却未得到应有的结果后,我们开始怀疑是不是自己的假设出了错,并决定测试纹理是否存在问题。我们重新配置了资产生成传输过程,将所有图像缩减至原来规格的四分之一,从而确保所有的游戏资产能够都能匹配一个单一的纹理。但是最终呈现在我们眼前的却不是预想的结果,游戏的运行仍然很糟糕(甚至连外观也变得非常丑陋)。

这便意味着游戏的逻辑是我们的一大障碍。我们尝试着使用Dalvik分析器去分析游戏。虽然在纠正低帧率问题时分析器非常有帮助,但是当你面对的是帧率变化问题时,分析器就很难派上用场了——除非你能准确地预测游戏何时会出现变化,当然了,如果知道这一点,我们也就不会深受这一问题的困扰了。

坐下来听听内容开发者对于游戏逻辑的分析,并根据直觉在代码中设置一些调试输出内容,最终我们得到了一个意想不到的结果,即当我们根据单一的逻辑迭代而创造出大量的对象时,我们的Z轴次序代码却变得异常混乱!

为什么要保持Z轴次序?

我们不需要对游戏引擎担负着将完美的内容呈现于屏幕上并对碰触输入做出回应等任务(除了决定游戏的编译逻辑之外)表示惊讶。

对于像MoMinis Platform等2D引擎来说,我们需要根据对象的Z轴次序进行规划与穿越(从后向前或从前往后),并决定哪些对象用于接受碰触输入而哪些对象用于呈现图像。

碰触处理程序

我们的碰触处理程序是一个简单的适配器,能够基于OS预测屏幕碰触事件,并分配游戏事件,也就是当我们点击屏幕上的特定位置时,该位置上的对象将接收到逻辑碰触事件而不只是“用户点击于(x,y)”。

我们所提供的这种抽象层面能够有效地帮助那些需要处理低层面碰触事件并决定哪些对象需要对此做出反应的游戏开发者。我们需要确保开发平台适用于所有初级开发者,并能够让他们轻松上手。

各种游戏对象彼此堆积以及在拥有多个对象的位置上接收事件都不是游戏开发者这一级别能够应对的问题。这也是我们决定将碰触事件设置为最顶部对象(即位于手指下方)的主要原因。

最简单的方法便是基于前后顺序浏览所有的对象,并找到第一个能够处理事件的对象。

渲染队列

不管是面向Android还是iPhone设备,MoMinis的游戏引擎主要是受OpenGL ES所驱动。而为了保证可携带性,我们最终选择了OpenGL ES 1.1 API。

为了确保最大的填充率,我们将对象分为两组,即不透明对象和透明对象。不透明对象先呈现图像,并从前往后移动——确保最终碎片将是无形的,并由Z缓存所删除而不会在之后又呈现出来,从而保持整个光栅化管道的顺畅。

根据希腊字母的排序,透明对象将在不透明对象渲染后由后往前进行渲染。

平台开发者所面临的挑战

我们的跨平台开发环境对于游戏创造具有极高的要求:

无经验的非程序员开发者必须能够轻松地创造出一款出色的游戏

游戏必须具有高性能

游戏必须具有华丽的外观

游戏必须具有较小的规格

游戏必须是可移植的

开发者不应该为移植问题而纠结

大多数情况下我们的情况总是会与这些要求发生冲突,从而导致我们的游戏引擎开发工作变得更加困难。

当你是一名内容开发者并创造了一款完整的游戏后,你可以很快便让游戏逻辑变得更加复杂。你拥有各种内容,包括动画和声音等能够暂时改变基本的游戏机制,甚至在某些游戏中,你还必须随着玩家的前进不断地创造一些新关卡(就像在《企鹅跳跃》中那样)。

我们的平台开发者并不是资深的程序员,也不了解游戏引擎执行过程中所面临的双重障碍,所以有时候他们并不能创造出最佳游戏逻辑。甚至,随着平台的发展,内容开发者所采取的一些做法反而会让情况变得更加糟糕,并因为平台缺少足够的功能而导致他们找不到其它可替代的解决方法。

这便意味着内容开发者可能会在游戏玩法的创造过程中创造并破坏许多辅助对象,从而让实例管理变成一种真正的挑战。

问题集合的排序

因为最初我们是基于J2ME去编写引擎,它只允许开发者在定量的类和代码规格下使用基于CLDC(游戏邦注:有限链接设备配置,是为运行在资源非常有限的设备上的J2ME应用程序制订的架构)的工具。例如基于java.util.Vector的代码——具有同步方法的类。

管理游戏的实体把握着所有当前的对象实例,并且这些实例都是基于Z轴次序进行排列。每次当出现一个新的对象,一个对象被破坏或者它的Z轴次序被改变时,我们便会将其从数组中删除(在一个指标向后退时移动所有元素),并在二进位检索后将其重新插入一个适当的位置中(在一个位置向前进时移动所有元素)。

因为面对对象浏览数组是一种O(n)操作,删除并插入元素也是一种O(n) 操作,而找到正确的位置则是O(log n)操作——一个对象上的所有基本行动都将基于当前的对象数而拥有自己的运行比例。

更别说当游戏获得更多复杂元素并要求更多对象时,因为这只会衍生出更多的问题。《企鹅跳跃》每隔几秒钟便会创造出40至60个对象,而这个世界只能够容纳500至700个对象。另外,当出现一个新的对象后,它的Z轴次序被设定为最大的Z轴次序——但是很快便会被其对象构造函数调整为层面的Z轴次序,从而导致引擎所需要执行的Z轴操作数量翻了一倍。这也解释了为何会出现帧率变化——创造对象将会引起较长的逻辑迭代,从而对帧率带来暂时的影响。

最初的优化实验

我们需要快速解决这一问题。因为这是阻碍游戏发行的关键元素。

一开始我们采取了相对保守的方法,即尽量不改变过多内容。我们发现这些集合不可能始终保持整齐的次序——在渲染和碰触处理前。所以我们的首次尝试便是推迟对于这种情况的排序。

这么做应该能够处理对象创造的问题,将其从O(n)操作缩减到O(1)——因为我们想要在列表尾端添加新对象并在排序前基于其Z轴次序进行查询。这么做还能够削减一半的Z轴次序改变,因为我们将不再面向集合添加任何对象,所以便可以像过去那样随意进行移动。

我们采取了标准方法并尝试着快速移动即时数组排序。不幸的是,因为开发者与游戏引擎的合约规定,基于相同Z轴次序的对象必须根据其Z轴次序的改变而在此进行排序,也就意味着我们必须使用稳定的排序算法而非快速排序。

我们这种不顾后果的想法其实并不是问题所在,因为只需要寻找一种稳定且有效的排序算法便可。后来我们又尝试了树排序法。虽然这种方法较为稳定,但是其执行速度却异常缓慢。因为我们不能确保是否能在集合排序中使用树排序方法,而这一点正是树排序的薄弱环节——有可能我们最终只能面对一棵没有任何功效的“树”。

虽然我们已经在网上浏览了各种各样的算法,但却没时间深入学习并了解,所以我们最终又折回快速排序法。这次我们利用了Z轴次序的偏差,即为每个对象提供一个能够改变其Z轴次序的明确指标。虽然这不算最好的方法,但却是最适合我们现状的方法(鉴于时间有限)。

最终我们未能在发行前完成这种优化,因为我们未拥有足够的时间去测试改变,所以只能先把致命问题解决了,即避免在单一逻辑迭代中出现过多对象而阻碍游戏的发行,这也就意味着我们不可能拥有额外的成本去进行Z轴次序优化。比起马上呈现出全屏内容,我们希望内容开发者能够一行一行地填补屏幕。

O(1)集合排序

尽管在发行《企鹅跳跃》时我们并未彻底解决问题,但是在下一款游戏发行前我们一定会做到这一点。

早前当我基于kernel 2.4.18而运行Red Hat Linux 8时,我发现开发中的Linux 2.6具有全新的O(1)调度程序。那时候关于这一调度程序的描述是,它能够基于恒定的时间复杂度而安排多种任务——不管该系统负载了多少内容。在2.6.23版本出现前,这一调度程序一直都是开发者的首要选择,但是最终它还是被O(log n)Completely Fair Scheduler所取代。

O(1)调度程序是由Ingo Molnar所创造的,因为他不想将优先信息储存在任务本身中,并根据其优先权进行排序,而是希望能够将任务优先信息储存在集合中,所以他便为每个优先等级创造了不同的队列,从而让开发者能够直接访问每个队列中的最主要信息。

因为优先等级的数目早就确定好了,所以我们必须在固定时间内浏览所有插入和输出的优先队列。

鉴于任务的运行非常不规律(就像游戏中的对象),但却仍需要遵守调度安排,所以我们便尝试了一种基于Linux的新方法。

让我们先回顾集合的一些要求:

集合必须能够在O(n)操作范围内按顺序进行穿越

相同Z轴次序中的对象必须根据最后Z行动的次顺序进行排序

基于Z轴次序的插入,删除和改变而执行O(1)操作

确保较低的内存复杂度

出于向后兼容性,所有的Z轴次序值都是合法的(包括负值和非整数值)

简单化

让我们着眼于一个较为简单的例子,并忽视最后的要求。我们将一个对象的Z轴次序定义为带有界限的自然数字(游戏邦注:也就是在0至100之内),而较低的Z轴次序便意味着对象向后退。

也就是我们必须为一个数组中的每个合法Z轴次序准备一个“储存桶”,而根据插入顺序去储存所有对象。我们将基于双面串行去执行这一“储存桶”,从而可以将一个新对象的O(1)插入其尾端,并按次序在两个方向间来回移动着:

Simple data-structure(from gamasutra)

Simple data-structure(from gamasutra)

简单的数据结构

在Z轴次序储存桶中添加一个新对象也包含在储存桶数组中查阅Z轴次序,并在列表最尾端添加对象:

Addition of an object to a bucket(from gamasutra)

Addition of an object to a bucket(from gamasutra)

在储存桶中添加一个新的对象

通过在列表尾端添加对象,我们便能够维护最后的Z轴次序行动的次排序(我们选择了由后往前的方向)。

穿越整个结构的行动并不重要——我们需要做的便是在第一个(或最后一个)储存桶中浏览对象,并移向下一个储存桶(直到没有其它储存桶了)。

但是我们还需要从一个储存桶中删除某些对象(不管该对象是被摧毁了还是因为Z轴次序的改变被移到其它储存桶中)。为了避免需要浏览整个集合才能找到储存桶列表中对象链接的位置,我们使用了对象本身所具有的辅助磁场,即我们将在精灵中隐藏现有的链接列表,从而可以使用O(1)找到该对象。

基于对象当前在储存桶中的位置,我们创造了向后引用:

Removal of an object from a bucket(from gamasutra)

Removal of an object from a bucket(from gamasutra)

从储存桶中删除一个对象(蓝色箭头指代的是从对象到链接列表链接的向后引用)

储存桶的选择是指基于指标去浏览相关数组,添加对象到链接列表尾端是O(1)操作,而从链接列表中删除一个链接也是O(1)操作(考虑到链接对象本身),所以我们总是能够获得所需要的所有行动的O(1)运行时间以及集合排序。

让我们分析这一执行过程。首先,游戏对象必须能够报告其Z轴次序,并持有辅助链接:

sorting_code1(from gamasutra)

sorting_code1(from gamasutra)

为了达到抽象效果,我们的对象中并未持有实际的链接列表链接类,而是将其穿越一个界面,让我们能够将该链接从列表中删除:

sorting_code2(from gamasutra)

sorting_code2(from gamasutra)

接下来便是执行一个持有储存桶内容的列表。这个列表必须呈现出缓存对象的链接,所以我们便不能使用任何Java内嵌集合:

sorting_code3(from gamasutra)

sorting_code3(from gamasutra)

现在我们需要做的便是为每个合法的Z轴次序创造一个储存桶:

sorting_code4(from gamasutra)

sorting_code4(from gamasutra)

如此我们便完成了基本的数据结构的搭建。让我们分别检查每个行动,并明确我们该如何进行O(1)操作。

创造对象

为了在集合中添加一个新的对象,我们必须将该对象添加到其当前Z轴次序的储存桶尾端,并在对象中缓存链接。这么做也能够保证同样基于Z轴次序的对象的次顺序:

sorting_code5(from gamasutra)

sorting_code5(from gamasutra)

因为这是一个双面串行,所以将其添加到尾端便是一个O(1)行动。

摧毁对象

为了从Z集合中删除一个对象,我们便需要将其从储存列表上解开链接:

sorting_code6(from gamasutra)

sorting_code6(from gamasutra)

因为从双面串行中解开链接是一种O(1)操作,并且我们又能够直接访问链接,所以这同样也是从集合中删除一个对象的运行时间复杂性。

改变Z轴次序

改变一个对象的Z轴次序与将其从之前的储存桶中删除,并添加到新的储存桶一样简单:

sorting_code7(from gamasutra)

sorting_code7(from gamasutra)

即使该对象是转向与当前一样的Z轴次序,我们也不能省略操作阶段,这是开发者在引擎合同上明确规定的内容。而因为我们使用了O(1)操作,所以这也算是O(1)操作。

按顺序穿越

按顺序穿越(不管是由后往前还是从前向后)并不重要——我们可以根据任何顺序在所有储存桶间穿越着,而对于每个储存桶我们也可以根据任何顺序在其列表中穿越着(我们一般都采取由后往前的方向,而如果是在储存桶和储存桶列表间穿梭,我们才会采取从前向后的方向)。

推进一个链接列表是一种O(1)操作,而穿越整个结合则是O(n)操作。

现实世界

尽管以前的解决方法也适用于新设计,但是因为它存在于我们的引擎中,所以我们便不能使用这一方法——向后兼容性便是主要原因。

由于之前所遇到的种种原因,如J2ME语言,引擎仍然能够有效地作用于定点运算中,并支持”非整数“和负Z轴次序。除此之外,Z轴次序的范围将跨越整个整数空间——但这对于单一数组来说太过广泛了。而每个被创造出来的对象都将默认地接受最大的Z轴次序(231-1)。

不幸的是,因为O(1)运行时间复杂性以及有效的内存复杂性,我们未能想出这一问题的解决方法,所以只能选择最有效的一种方法。

我们能够否定非整数的Z轴级别以及特别高或特别低的数值,但是这么做都不如选择有界限的自然Z轴次序来得有效。我们也为默认的Z轴次序提供了特殊的优化。

首先我们使用链接列表取代了储存桶数组。这能帮助我们在“优化的”储存桶间为非整数Z轴级别打开一个全新的储存桶,或根据需要添加更大/更小的新储存桶。

基于最大的默认条目,我们同样也在最后的Z轴次序的列表末段进行配置。储存桶列表中的每个条款都包含了一列可分类的内容及其所呈现的Z轴次序。

一开始,储存桶的列表是根据每个优化的Z轴次序以及最大级别进行配置。随后,储存桶列表链接以及储存桶本身将被置于一个数组中(“快速存取”数组)。以下便是我们在缓存链接列表的内部结构时所使用的模式:

Buckets in linked-list with quick-access array(from gamasutra)

Buckets in linked-list with quick-access array(from gamasutra)

基于“快速存取”数组的储存桶链接列表

这将保证我们能够快速访问优化的Z轴次序储存桶——甚至当有新的Z轴次序储存桶被插入其中时。

当需要添加新的对象时,我们首先需要明确它的Z轴次序是否是默认的。如果是的话,我们便会将其添加到直接引用的储存桶上;而如果不是,我们则会测试它是否是优化的Z轴次序。如果正是如此,我们便可以在快速存取数组中找到它并将其添加到最合适的储存桶中。

如果最终证明这是未经优化的Z轴次序,我们将重新回到O(n)搜索法中,重新寻找带有新的Z轴次序的现有储存桶。我们可以按顺序穿越所有的储存桶,即从最近的优化次序开始,尝试着在这一过程中找到带有Z轴次序的储存桶。而如果我们做不到这一点,便只能在储存桶列表中两个适当链接间插入一个新的储存桶了:

Opening a new bucket between existing buckets(from gamasutra)

Opening a new bucket between existing buckets(from gamasutra)

在两个现有储存桶间打开一个新的储存桶

支持负Z轴次序也是如此,除了不能使用快速存取数组以外,并且在此的穿越是从零点向下发展。

在此我们仍需要面临的一个问题便是,丢弃未经优化的空储存桶。我们可以在穿越集合时完成这一点。例如我们可以在每个未经优化的储存桶上安插一个计数器,从而明确在穿越过程储存桶有多少次是呈现出空桶的状态。而如果在此期间计数器的指标超越了极限,我们便可以删除该储存桶。

这不仅能让我们支持早前的游戏,同时也能让新游戏在遵循这些指导方针后呈现出更棒的性能。

结论

首先,游戏开发者间存在着一种共识,性能瓶颈是渲染过程的结果,而不管游戏逻辑执行中的CPU元素。当我们开始解决《企鹅跳跃》中所存在的问题时,我们便开始从内容开发者的实验中进行推测,并在落实问题是否存在前尝试着通过渲染过程去寻找问题。

我们投入了好几个小时却未能发现任何有效的解决方法,我们忽视了最初获得的测试结果并开始基于系统进行研究,并很快发现了图像与问题的存在其实没有任何关系。开发者在面对漏洞时都会专注于过程而非缘由。但是找到“为什么?”的答案才是我们真正的任务。

作为一名自学的年轻程序员,我曾经有一段时期根本就不相信计算机科学研究能给我带来任何帮助,也就是我认为在今天的世界中,每种语言都具有基本的数据结构和算法,所以我们并不需要来自“现实世界”的这些知识。我知道还有一些资深的程序员也是这么认为的。但是在经历了过去几年解决问题的体验后,我的想法发生了改变。着眼于我们最初的完善行动,如果我们能够使用Java所提供的帮助,也许便能更好地完善游戏性能。这一简单的解决方法能够帮助我们解决80%以上的问题,并且大多数情况下都会非常有效。

但是作为游戏引擎开发者,我们还不足以自称程序员;我们需要更加努力去成为一名优秀的程序员。我们开发了一款能够在更广泛的设备上顺畅运行的应用,并尽可能压缩每个可行的CPU,GPU和内存资源。对于大多数应用来说,能够解决80%的问题就已经很棒了,但是作为游戏引擎开发者,我们希望能够做到更好,所以我们必须努力把握住其它未解决的问题。

在本文的例子中,我们使用了一种完全不同的方法去解决剩下20%的问题,并利用来自OS-kernel调度程序技术去寻找更有效的解决方法。最后,我们未使用任何分类算法便成功地为一个集合进行了分类!而如果使用的是“传统”方法,我们便永远不可能做到这一点!所以请不要低估你所吸收的任何知识,不管它与你所从事的领域是否相关——因为这可能是帮助你取得最后成功的秘密武器!

本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦

Solving Smartphone Performance Problems

by Itay Duvdevani

Working on the phone side of the MoMinis Platform is an interesting job. Since our content developers work mostly on the Simulator for game mechanics, bugs and performance tuning — we always get a little nervous when a title becomes ready to start testing on actual phones.

Compilation of a game can reveal bugs (either in the simulator or the compiler) that will break the game’s logic, or performance problems that aren’t simulated in the Simulator correctly.

When a game works in the Simulator, but not on a phone, it usually finds its way to my team pretty quickly — As compiler developers, we’re the only ones that can pinpoint the problem.

The Problem with Jelly Jump

Our content developers were working on a new title — Jelly Jump. After several weeks of development, and a week or so from release, the game became mature enough to start testing on actual phones. On higher-end phones, the game ran smoothly and was actually very fun. However, on mid-range and low-end devices, we experienced random frame rate jitters that made the game unplayable. My team got to handle the situation.

We observed the problem on a relatively capable device (800MHz Snapdragon with Adreno 200 GPU). This wasn’t a low FPS problem, but FPS jitter that occurred every few seconds, in what looked like random occurrences. The game would run at 40-50 FPS, then dip to 10-5 FPS for a second or so, and then continue.

My team was tipped off by the content developers about some testing they’d done, which showed that removal of graphics assets made the game run better.

At compile time, the asset-generation stage is responsible for distributing the game’s assets to texture atlases. Since this is an automatic process, it uses several heuristics to predict which objects will be drawn together, to minimize texture swapping. These heuristics are currently pretty basic, and rely on static analysis only — we don’t have performance-guided optimization facilities as of today. We suspected the assets were being grouped in an inefficient manner, causing texture thrashing due to memory exhaustion every time certain objects were being drawn.

This was a problem. We had no “magic solution” or even a workaround for this — we didn’t even have a quick ‘n’ dirty solution. The game simply had too many graphics for our asset-generation pipeline.

After spending a few days figuring out how to tackle the problem, yielding nothing practical for the time we had — we started doubting our assumptions, and tested whether there even was a problem with the textures. We configured the asset-generation pipeline to downscale all the graphics to a quarter of their size, making all the game’s assets fit in a single texture. To our surprise, the game ran just as bad (and was very ugly).

That meant that the game’s logic was the bottleneck. We tried profiling the game with the Dalvik profiler. Unfortunately, profilers are very useful when you debug a low-FPS problem, but almost useless when you debug an FPS jitter problem, unless you can predict exactly when your game is going to jitter — and of course, if we knew that, we probably would have known what the problem was.

Sitting down with the content-developers, hearing about the game’s logic — and using our intuition to plant some debug prints in the code, we came to a surprising conclusion — our Z-ordering code was misbehaving when a large amount of object were being created in a single logical iteration!

Why Keep Things in Z-Order?

It should come as no surprise that some of the tasks our game-engine is responsible for, except driving the game’s compiled logic, is to draw pretty things to the screen, and respond to touch input.

For a 2D engine like the MoMinis Platform, objects need to be arranged according to their Z-order and traversed (either in a back-to-front or front-to-back order), to decide which objects receive touch input, and which get drawn first.

The Touch Handler

Our touch handler is a simple adapter which expects screen-level touch events from the OS, and dispatches game-level events — so when I click at a certain location on the screen, the object at that location receives a logical touch-down-event instead of just “The user clicked at: (x,y)”.

We provide this layer of abstraction to save game developers the need to handle low level touch events and decide which object should respond. It is very important that our development platform be kept suitable for entry-level developers, and be easy to use with as little hassle as possible.

The case where game objects are stacked on one another and a touch event is received at a location that’s within multiple objects can be difficult to handle at the game developer’s level. That’s why we decided that the touch event in this case is dispatched to the top-most object that’s visible under the finger, and that’s it.

The easiest way to do it is to scan all the objects in a front-to-back order and find the first that can handle the event.

The Render Queue

The MoMinis game-engine is powered by OpenGL ES on both Android and iPhone. For portability reasons, we limit ourselves to the OpenGL ES 1.1 API.

To maximize fill rate, we separate objects into two groups: opaque and translucent. The opaque group gets drawn first, front-to-back — so fragments that are eventually invisible will be culled by the Z-buffer instead of being drawn-upon later, saving the entire rasterization pipeline.

For correct alpha-blending, the translucent group must be rendered back-to-front, after the opaque group has been rendered.

Challenges to Platform Developers

The demands from our cross-platform development environment and the games it produces are challenging:

Inexperienced, non-programmer developers should be able to create nice games easily

Games should have good performance

Games should be pretty

Final game size should be small

Games should be portable

Developer should be bothered with porting issues as little as possible (preferably none at all)

These demands conflict most of the time, and make our game-engine development a challenge.

When you’re a content developer making a fully-fledged game, the game’s logic can get quite complex pretty quickly. You have all these power ups and bonuses and special animations and sounds that change your basic game mechanics temporarily, and in some games you have to dynamically generate the level as the user advances (as in Jelly Jump).

Our platform developers aren’t always seasoned programmers, or simply don’t know the ins-and-outs of the game engine’s implementation, so game logic is at times written in a suboptimal fashion. In addition, as the platform is still evolving, common practices used by content developers may worsen the situation, usually with no real alternative due to lack of features in the platform.

This means that sometimes a content developer will create and destroy many auxiliary objects in a short time during gameplay, making instance management a challenge.

An Always-Sorted Collection Problem

As the engine was originally written for J2ME, it allowed use of only the facilities offered by CLDC under a limited number of classes and code size, and some implementations are very far from optimal. For instance, the code relied on java.util.Vector — a class that has all its methods synchronized.

The entity that manages the game held all existing object instances in an array sorted by Z-order. Every time an object was being created, destroyed, or its Z-order changed — we would remove it from the array (moving all the elements after it one index back) and re-insert it at the correct location after a binary search (moving all elements after it one place forward).

Since scanning the array for the object is an O(n) operation, removing and inserting an element is again an O(n) operation, and finding the correct spot is a O(log n) operation — all the basic actions on an object had runtime proportional to the number of objects existing at the moment.

Needless to say, as games got more complex and required more objects, this became a problem. Jelly Jump was creating 40-60 objects every few seconds, in a world containing 500-700 objects. On top of that, when an object is created, its Z-order is set to the maximal Z-order allowed — but then immediately set to its layer’s Z-order by the object’s constructor, doubling the amount of Z-order operation the engine had to perform. This would explain the jitter — creating objects caused a very long logical iteration, dipping FPS temporarily.

Initial Optimization Experiments

We needed to fix it, and we needed to fix it fast. This was the main problem blocking the release.

At first we took a conservative approach, trying to make as few changes as possible so close to a release. We concluded that there was no reason for the collection to be kept in order at all times — just before rendering and touch-handling. Our first attempt was to defer sorting to these two occasions.

This should have addressed the object creation problem, reducing it from O(n) to O(1), as we intended to simply append the new object to the end of the list, and query it for its Z-order just before sorting. This would also cut down by half Z-order changes, as we were no longer adding new object to the collection, then moving it like we used to.

We took the standard approach, and tried good-ol’ quicksort for the just-in-time array sorting. Unfortunately, the game engine’s contract with the developer is such that objects at the same Z-order are sub-ordered by the order their Z-value changed — meaning our sorting algorithm had to be stable, which quicksort isn’t. (It took us a good while to figure that this was what’s causing the weird phenomenon we were seeing with the game’s graphics).

Somewhat humbled by our reckless arrogance in thinking this was such an easy problem to fix, we looked for a good sorting algorithm that is both stable and efficient. Tree sort was the next thing we tried. Though the sort was now stable, it was also very slow. We failed at identifying that we were using tree sort on a mostly-sorted collection, which is tree sort’s Achilles Heel — in that case you end up with a degenerate tree and efficiency is no more.

Reading a little bit on the web we encountered all sorts of sorting algorithms and variations on existing algorithms that might or might not work, but we didn’t have the time to start learning and understanding each one and decide whether it was appropriate for our needs.

So we went back to quicksort. This time we applied a bias to the Z-order, giving a monotonic index for each object that got its Z-order changed. This wasn’t perfect, but it was reasonable for the time we had.

Eventually, this optimization didn’t make it to the release — we didn’t have the time to properly test the change and we were able to work around the problem enough to make the game releasable by avoiding creation of multiple objects in a single logical iteration, thus not triggering the Z-order overhead. Instead of creating a full screen of jellies at once, we instructed the content developers to create a screen line-by-line.

An O(1) Always-sorted Collection

Although we released Jelly Jump without fixing the problem, we had to fix it for the next game.

Back when I was running Red Hat Linux 8 with kernel 2.4.18 (or something), I remember reading that the under-development Linux 2.6 had an all-new O(1) scheduler. The news about this scheduler was that it could schedule many tasks in constant-time complexity, no matter how loaded the system was. This scheduler held up till 2.6.23, when it was replaced by the O(log n) Completely Fair Scheduler.

What made this an O(1) scheduler was that the author, Ingo Molnar, choose not to store priority information in the tasks themselves and sort them in a collection according to their priority, but to store the task’s priority information in the collection itself — he created a different queue for each priority level, giving him direct access to the head of each queue.

Because the number of priority levels was predetermined, scanning all priority queues for insertion and dequeuing is done in constant time.

Since tasks behave almost erratically, much like objects in our games, but still must be kept in order for scheduling, we tried a new approach, based on the Linux solution.

Let’s review the requirements from our collection:

Collection should be traversable by order in both directions in no more than O(n)

Objects with the same Z-order should be ordered by sub-order of last-Z-action-on-top

Insertion, removal and changing of Z-order should be O(1)

Memory complexity should be kept low

All Z-order values are legal (including negative and non-integer values) — for backward-compatibility reasons

Simplification

We’ll start with a simpler case, and ignore the last requirement. We’ll define an object’s Z-order to be a natural, bounded number (say, between 0 and 100), where lower Z-order means an object is further back.

The idea is to have a “bucket” for every legal Z-order in an array, holding all objects that have that Z-order by order of insertion. We’ll implement a “bucket” with a double-ended linked-list, which allows us an O(1) insertion of a new object to its tail, and traversal in-order in both directions:

Simple data-structure

Adding a new object to a Z-order bucket involves looking up the Z-order in the buckets array, and appending the object to the list’s tail:

Addition of an object to a bucket

By appending the object to the tail of the list, we maintain sub-ordering by last-Z-order action (we chose back-to-front forward traversal direction)

Traversing the entire structure is trivial — all we have to do is scan the objects in the first (or last) bucket, and move to the next (previous) bucket until there are no more buckets left.

However, we still have to address the removal of objects from a bucket (either when it is being destroyed, or moved to another bucket as its Z-order changes). To avoid having to scan the entire collection to find the location of the object’s link in the bucket’s list, we’ll use an auxiliary field in the object itself – we’re going to cache the current linked-list link holding the object in the bucket inside the sprite, allowing us O(1) access to it.

We’re creating a back-reference from an object to its location in the bucket it is currently in:

Removal of an object from a bucket (blue arrow donates back-reference from object to linked-list link)

From here things look trivial. Since selecting a bucket is matter of array lookup by index, adding to the end of a linked list is O(1) and removing a link from a linked-list is also O(1) (given the link object itself), we can achieve an always-sorted collection with O(1) runtime for all the actions we need.

Let’s examine how it’s implemented. First, a game object will have to be able to report its Z-order, and hold the auxiliary link:

For the sake of abstraction, we’re not holding the actual linked-list link class in the object, but hold it through an interface that allows us to remove the link from the list:

Next step is to implement the list that will hold a bucket’s contents. Remember that it has to expose the link that holds an object for caching, so we can’t use any Java built-in collection:

Now all we have to do is to create a bucket for each legal Z-order:

And by that we’ve written our basic data structure. Let’s examine each action separately, and see how we make it an O(1) operation.

Object Creation

To add a new object to the collection, all we have to do is to append it to the end of the bucket of its current Z-order, and cache the link in the object. That will also ensure sub-order of same-Z objects:

Since this is a double-ended linked-list, appending to its tail is an O(1) action.

Destroying an Object

To remove an object from the Z-collection, all we have to do is unlink it from the bucket’s list:

Since unlinking a link from a double-linked-list is an O(1) operation, and we have a direct access to the link, this is also the runtime complexity for removing an object from the collection.

Changing Z-order

Changing the Z-order of an object is as simple as removing it from the previous bucket, and adding it to the new one, just like before:

Note that we’re not skipping the operation even if the object is moved to the same Z as its current Z – this is on purpose as we have to pop the object up to be above all the rest according to the engine’s contract with the developer. Since we’re only using O(1) operations, this is also an O(1) operation.

Traversing in Order

Traversing in order (either back-to-front or front-to-back) is trivial — we traverse all the buckets in the desired order, and for each bucket we traverse its list in the desired order (for back-to-front we’ll iterate normally, and for front-to-back we’ll iterate in reverse on both buckets and bucket-lists).

Since advancing in a linked-list is an O(1) operation, traversing the entire collection is O(n).

Note: I’ve included only interface listings for things that are trivial to implement. For a full code listing, check out https://github.com/mominis/zorder and the SimpleZCollection class.

The Real World

Though the previous solution will work well in new designs, we couldn’t use it as it was in our engine — the reason being, as always, backward compatibility. We have to address the last bullets we neglected in the previous section.

For historic reasons, such as J2ME, the engine still works in fixed-point arithmetic — and supports “non-integer” and negative Z-orders. In addition, the range of possible Z-orders spans the entire integer space – way too many for a single array. Moreover, every object created gets the default maximal Z-order possible (231-1).

Unfortunately, we couldn’t come up with a solution to this problem in O(1) runtime complexity and reasonable memory complexity, and had to settle for best-effort approach.

We allow negative an non-integer Z-levels, as well as very high or low values – but these aren’t as optimal as natural bounded Z-orders. We also provide a special-case optimization for the default Z-order.

First thing we did was to replace the buckets array with another linked-list. This will allow us to open new buckets for non-integer Z-levels in between “optimized” buckets, or add new large/small buckets on-demand.

We also allocate at the end of the list the last Z-order, for the maximal default entry. Every item in the buckets list holds a list of sortables, and the Z-order it represents.

Initially, the buckets list is initialized with a bucket for each optimized natural Z-order, plus the maximal level. Then, the links of the buckets list holding the buckets themselves are put in an array (the “quick-access” array). As before, we cache the internal structure the linked-list uses:

Buckets in linked-list with quick-access array

This ensures we can always access optimized Z-orders buckets quickly, even when new Z-order buckets are inserted in-between.

When we need to add a new object, we first check if its Z-order is the default order. If so, it is appended to that bucket, which we hold direct reference to. Otherwise, we’ll test whether it’s an optimized Z-order. If it is, we can simply append it to the proper bucket after looking it up in the quick-access array.

If this is an un-optimized Z-order, we’ll fall back to an O(n) search method that will look for an existing bucket with the new Z-order. We simply traverse all the buckets in order, starting from the nearest optimized one, and try to find a bucket with that Z-order. If we can’t find one, we’ll insert a new bucket between links in the appropriate location in the buckets list:

Opening a new bucket between existing buckets

Support for negative Z-order is similar, except we cannot use the quick-access array, and traversal is done from zero downwards.

One little issue we still need to resolve is discarding empty non-optimized buckets. This can be done while traversing the collection. For instance, a counter could be kept for each non-optimized bucket holding how many times the bucket was empty when traversed. If during traversal the counter is over a threshold, the bucket can be removed.

This allows us to support old games, but give new games that follow the guidelines better performance.

You can find the complete implementation of this collection at https://github.com/mominis/zorder in the FixedPointZCollection class.

Conclusion

First, there’s a common conception among game developers that performance bottlenecks are the result of the render process, neglecting the CPU aspect of executing the game’s logic. When we first started working the Jelly Jump problem, we were given speculations from the content developers’ experiments, and tried finding the problem with the rendering process before we even proved the problem was there.

It took us several hours of finding nothing to stop for a second, ignore the initial testing results we got and start working methodically, finding quickly that graphics had nothing to do with the problem. As developers working a bug, we need focus the bug reporter on what happens, not why it happens. Finding the “why?” is our job.

As a young self-taught programmer, I had a period where I didn’t believe in Computer Sciences studies, thinking that in today’s world every language already has all the basic data structures and algorithms implemented for you, and there’s little need for this kind of knowledge “in the real world”. I know some experienced programmers that still think so. But after having my share of problems to solve over the years, I see it more as a 20-80 rule. Looking at our initial optimization attempts, we were able to improve performance by using what Java had to offer. The simple solution took us 80 percent of the way — which in most cases is good enough.

But as game engine developers, we can’t be “good enough” programmers; we need to be the best programmers. We develop applications that should run as smoothly as possible on a wide variety of devices, squeezing our every last bit of available CPU, GPU and memory resources (except when we’ve got battery issues on mobile devices, but that’s beside the point). For most applications, 80 percent is enough, but as game engine developers we must be able to do better — and for that we must have that extra edge over “non-believers”.

In this case, the extra 20 percent made us adopt a different approach altogether, and employ methods from OS-kernel scheduler technology to achieve a better solution. In the end, we’re sorting a collection without using any sorting algorithm! Employing “traditional” methods would never have gotten use there. Never underestimate knowledge you acquire, no matter how unrelated to your field it may seem — that’s what makes brilliant moments.(source:gamasutra


上一篇:

下一篇: