Johan Hoberg谈游戏测试：探索测试空间

发布时间：2014-08-27 14:29:22 Tags：功能,可靠性,性能,测试,用户

作者：Johan Hoberg

引言

所以你将计划如何测试你正在开发的游戏，或者即将开始开发的游戏。你要从哪里开始？也许你清楚自己将开发的游戏到底是什么。利益相关者已经跟你明确了游戏中需要具有哪些内容，以及在某种程度上它该如何运行。但这通常只涉及到完整的测试空间的一小部分。我使用了“测试空间”去传达你需要通过运行去测试一切内容的完整测试。

你清楚自己永远都不会将完整的测试空间与测试组合在一起，因为这是一种不划算的做法。但你可能清楚只是测试游戏的核心功能是不够的。这将引出一个优先权的问题，即关于你真正需要执行的是哪种测试，但是你是如何看待你最先需要使用的所有测试？

在本文中我将尝试着罗列出我认为在开始开发一款新游戏前需要考虑的不同测试类型。我使用了ISO 25010作为基础，并添加了一些来自James Bach和James Whittaker（为试验性测试创造了许多不同的方法）的想法，这将帮助我们更好地思考其它探索测试空间的方法。这并不是一份完整的综合列表，但我希望这将是一个好的开始。

game testing(androidpolice)

ISO 25010

ISO 25010是一个关于系统和软件的质量模型。它包含了8种软件质量类型，这将能够作为设计测试的有效开端。

功能适用性

这一类别包含了一个较大的测试空间，如果你并未花时间去思考你的方法你便很容易遗漏其中的大部分内容。

当产品基于特定条件获得使用时，产品所提供的功能将能够满足所声明且隐含的要求。用户行动将导致某种类型的结果。当我按压“新游戏”按键时，某些情况将会发生。当在游戏中通过鼠标按压按键时，根据光标所在位置将会产生不同的效果。当用户按压“连接社交媒体平台”时，游戏应该连接所示的社交媒体平台。等等。你的大多数测试可能会结束于这一类别，这可能是你所想到的最初的内容。

同样包含于这一类别且同样非常明显的是人工智能。AI是否基于特定的条件或采取适当的行动？音频也是如此—-正确的音频文件是否会基于特定的条件进行播放？这里也包含了核实游戏机制和物理引擎。

基于测试下的游戏，另外一件重要的事便是核实游戏世界是否匹配所声明的或隐含的要求。所有的事物是否各归其所？包括岩石，小贩，怪物，NPC，山洞，城堡，房子，河，山等等。

多人玩家功能也是功能适用性的一部分。

在这一类别中有许多不同类型的测试，它们在不同类型和游戏间也会有所不同。尝试着为不同类型创造关于所有功能适用性测试的完整概述本身就是一篇完整的文章。即使那样特定的游戏业能够偏离某种类型并需要其它特殊的测试。

如何面向所有的不同领域想出测试方法并不是一个简单的任何。在以前的文章中我曾经提到使用系统化创造性思维作为围绕着如何进行测试而发挥创造性想法的一种方法，即从一种常见的用户行为开始并应用SIT。

可靠性

所以除了功能适用性测试外，另外一种测试便是可靠性测试。即关于在特定条件下基于特定时间系统或组建执行特定功能。

容错性是一方面。当某些内容出错时会发生什么？游戏是否会崩溃或者游戏是否会基于适当方式处理错误？

恢复是另一方面。当游戏真正崩溃时会发生什么？用户数据发生了什么？进程，道具，分数等等。可能游戏永远也不会崩溃，但如果这种情况真的发生了，它对用户所造成的影响应该达到最小。

持续时间和成熟测试也包含于可用性类别中。如果你离开游戏24小时会发生什么情况？离开1周又会发生什么情况？通过这种测试可能能够发现内存泄露。在12小时的强化游戏后游戏将如何运行？任何性能是否会降低？

在这一类别中我同样也包含了通过点击UI的随机自动测试，该测试将往游戏世界中派遣自动化bots让它们在世界上随机游荡直至通过游戏世界之墙—-这将让你能够找到角色模型深陷于世界中的哪个地方。

压力测试也是可靠性测试的一部分。基于最简单的形式，它包含了像按压一个按键多次而不只一次的测试。它同样也可以包含上千名用户同时登录一个服务器或上百名玩家彼此进行PvP抗衡，或一起参与公共非实例任务。另外一个例子便是许多玩家同时执行微交易。在可用性的庇护下，我们能够明确游戏是否能够处理这些情况，并且不会崩溃，将你踢出去，这都是属于性能效率类别中。

可操作性

当处于特定条件下，产品将具有能够被理解，被学习，被使用并吸引用户的属性。所以这到底意味着什么？

容易使用，可学习且具有吸引力。游戏是否容易开始？用户是否理解游戏机制？用户界面是否可被理解？教程是否适当？

但当提到游戏时这也能包含有趣因素的测试。游戏是否真的有趣？现实性测试也包含于这一类别中，这包含了物理引擎等多种内容的可信度。游戏对于玩家来说是否逼真且具有沉浸感？平衡也包含于此，如果游戏是有趣且可被理解的它便是于此相关联的内容。

性能效率

这是与特定条件下使用的资源数量相关的性能。这将包含反应时间，加载时间和不同类型的世界行为。这也包含不同类型的资源利用，如RAM，GPU和CPU的使用。FPS与资源利用具有联系。

这也包含了压力条件下的性能测试，如在PvPMOSHI XIA 10个人与200个人相抗衡的FPS游戏。

许多这样的测试同时也会影响游戏对于用户的吸引力，关于这一点我们已经在可操作性方面讨论到了。

安全性

即关于信息和数据的保护以便未获授权的人或系统不能看到或修改相关信息而获得授权的人或系统可以直接访问相关内容。在没有任何微交易的单人游戏中，安全性真的很重要。但在测试一款MMO，MOBA或类似的游戏时，这种重要性将发生分解：机密性，完整性，责任性，真实性。安全性测试需要比其它测试类型更加具体的能力，这点对于现代游戏来说更加重要。如何与黄金卖家相抗衡便包含于这一类别中。

兼容性

在分享硬件或软件环境时，两种以上的系统或组件可以交换信息或执行所要求的功能。

互操作性是其中的一部分。支持不同的游戏方向键，游戏鼠标，键盘，Oculus和Morpheus以及类似的游戏设备。这包括一种软件系统（游戏）以及一种以上的硬件系统。

程序间的共存。Spotify在后台运行着。运行Team Speak，Ventrilo或其它VoIP客户端。两种以上的不同软件系统共存着。

可维护性

指的是产品可得到优化的效果。许多包含于这一类别中的活动都属于开发活动。但对于游戏测试者来说保护可测试性是有益的。

你可以在这一类别中整合修改测试。

可转移性

即系统或组件能够从一个硬件，软件或其它可操作或使用的环境有效地转移到其它硬件，软件等等上。

在此我们可以基于不同硬件和OS进行测试。手机游戏需要基于iOS，Android和Kindle上测试。存在不同的硬件制造商。不同的OS版本。来自同一家制造商的不同版本的硬件。对于PC，存在不同的配置。这里的关键在于拥有足够的业务信息能够优先考虑不同的硬件和OS配置，并伴随着最小但却获得推荐的硬件要求。

James Bach的启发法

James Bach是最著名的软件测试专家之一，他已经创造了一列能够应用于许多不同环境和游戏测试的通用测试技术。这与我之前所呈现的类别相比较是另外一种着眼于测试空间的方法。这能够帮助我们从不同角度去解决问题。

功能测试—-测试它能做些什么

域测试—-寻找产品所处理的任何数据

压力测试—-压制产品

流测试—-在做完一件事后做另外一件事

情节测试—-测试一个吸引人的故事

声明测试—-核实每一条声明

用户测试—-涉及用户

风险测试—-想象一个问题然后寻找它

自动检查—-检查许多不同的方面

为了更好地理解这些内容，让我们着眼于一些参考内容并阅读文件。

James Wihttaker的测试之旅

James Whittaker是一名教授也是微软的软件测试传播者，他之前曾效劳于谷歌。在他待在谷歌期间，他创造了执行探索性测试的一种方法，即测试之旅。测试之旅也可以用于探索测试空间，但需要采取与前面两张方法完全不同的方法。

“假设你初次访问一座像伦敦这样的大城市。这对于第一次到来的旅客来说真的是一个巨大，拥挤且让人困惑的地方，目不暇接。的确，即使拥有大量时间的旅客也很难完全了解伦敦所呈现的一切。同样的道理也适用于尝试着探索复杂软件且具有精良装备的测试员；世界上的所有资金都不能保障完整性。”

所以James Whittaker创造了一些你能够基于一些旅行比喻在测试下探索软件的“测试之旅”。

旅行指南之旅—-像谨慎的旅行者那样遵循使用手册的建议，绝对不离开导游

金钱之旅—-穿越销售演示确保一切用于促销的内容能够发挥作用

地标之旅—-选择一套功能并为它们决定次序，然后探索源自不同功能的应用直至你发现它们全部处于你的列表中。

智能之旅—-这一旅程呈现的是询问软件困难问题的方法。我们该如何确保软件的运行？

FedEx之旅—-在这一旅程期间，测试者必须专注于这一数据。尝试着明确已储存的输入内容并围绕着软件“跟随”它们。

垃圾回收之旅—-这就像是一次有条不絮的搜查一样。我们可以根据安装屏幕或对话进行抽查，不是进行细节测试，而是检测显著的内容。

糟糕的邻域之旅—-软件也拥有糟糕的邻域，即那些由漏洞所声称的代码。这一旅程是关于在这些代码部分运行测试。

博物馆之旅—-在这一旅程期间，测试者应该识别更早的代码和可执行的人工产品，确保它们获得了平等的测试关注度。

后巷之旅—-如果你的组织会追踪功能的使用，那么这一旅程将引导你去测试列表最底端的内容。如果你的组织会追踪代码覆盖率，那么这一旅程将带你去寻找测试未被覆盖的代码的方法。

通宵之旅—-通宵之旅中的探索性测试者将始终保持应用的运行。

超级名模之旅—-在超级名模之旅期间，焦点并不是在功能性或真正的互动上。而只是在界面上。

电视迷之旅—-电视迷之旅意味着未做多少实际工作。这意味着接受所有的默认值，保持输入栏空白，尽可能填充较少的格式数据，从不点击广告，未点击任何按键或输入任何数据而翻阅屏幕。

强迫症之旅—-反复执行同样的行动。重复，重做，复制，黏贴，借鉴，然后在循环重复这些内容。

结论

我是想借此提供给持续的测试探索一个较可靠的基础以帮助新的测试者或开发者能够思考他们为自己的游戏所运行的必要测试。

不管怎样这是处理这一问题的唯一方法，但我经常在思考测试问题时结合这三种方法。

不幸的是，这只是第一步。接下来我们需要筛选不同的测试，因为全部运行是不可能的。

（本文为游戏邦/gamerboom.com编译，拒绝任何不保留版权的转载，如需转载请联系：游戏邦）

Game Testing: Exploring the Test Space

by Johan Hoberg

Introduction

So you are about to plan for how to test the game you are developing, or preferably – will start to develop. Where do you start? Maybe you have some understanding of what it is that you are going to develop. Requirements, or stakeholders that are telling you what should be in the game, and how it should work to some degree. But often this only covers a small portion of the complete test space. I use “test space” to denote the complete set of tests you would have to run to test absolutely everything.

You know that you will never cover the entire test space with tests, since this would not be very cost effective. But you probably have some idea that just testing the core functionality of the game works will not be enough. It will always be a priority question just which tests you will actually perform, but how do you think of all the tests that you would possibly need in the first place?

In this article I will try to give a list of different types of test that I think should at least be considered before development of a new game starts. I will use ISO 25010 [1] as a base, and add thoughts from James Bach [2] and James Whittaker [3] who have developed different approaches for exploratory testing, which will help to think about other ways to explore the test space. This will not be a complete, all-comprehensive list, but my hope is that it will be a good start.

ISO 25010

ISO 25010 is a quality model for systems and software. It contains eight categories of software quality, which can be good to use as a starting point when designing tests.

Functional Suitability

This category covers a large test space, and it is easy to completely miss large parts of it if you do not take them time to think through your approach.

The degree to which the product provides functions that meet stated and implied needs when the product is used under specified conditions [1]. A user action should lead to some kind of result. When I press the “New Game” button something should happen. When a press the mouse button in game it has some effect depending on where the cursor is located. When a user presses “Connect to Social Media Platform” the game should connect to said social medial platform. And so on. A large chunk of your tests will probably end up in this category, and these are most likely the first ones you think of.

Also included in this category, and equally obvious, would be artificial intelligence. Is the AI taking appropriate actions based on specified conditions? Same goes for audio – are the correct audio files played based on specified conditions? Verifying game mechanics are included here as well. Same thing with physics engines.

Another big thing, depending on the game under test, would be verifying that the game world meets stated or implied needs. Is everything in the right place? A rock, a vendor, monsters, NPCs, caves, castles, houses, lakes, mountains etc.

Multiplayer functionality is also part of functional suitability.

There are obviously many different types of tests in this category, and they vary wildly between different genres and games. Trying to create a complete overview map of all functional suitability tests for different genres is a whole article in itself. And even then a specific game could deviate from the genre and require other unique tests.

How to come up with tests for all these different areas is not an easy task. In a previous article I mentioned using Systematic Inventive Thinking as one way think creatively around what to test, by starting with a normal user behavior and applying SIT. [4]

Reliability

So apart from the functional suitability tests, another category of tests is reliability tests. The degree to which a system or component performs specified functions under specified conditions for a specified period of time[1].

Fault tolerance is one aspect. What happens when something goes wrong? Will the game crash, or will the game handle the fault in a suitable way?

Recovery is another. What happens when the game actually crashes? What happens with user data? Progression, items, scores, etc. Preferably the game should never crash, but if it does, it should have minimal impact on the user.

Duration and maturity tests is also included in the reliability category. What happens if you leave the game on for 24 hours? A week? Memory leaks could typically be found by this kind of tests. How does the game run after 12 hours of intensive playing? Any performance degrades? Any other degrades?

In this category I would also include random automated tests that click through the UI, and tests that drop automated bots into a game world and let them traverse the world randomly until they pass the walls of the game world – this could for example let you find places where characters models could get stuck in the world.

Stress testing is also part of reliability testing. In it’s simplest form it can include tests like pressing a button many times instead of one. It can also include thousands of users logging into a server at the same time, or hundreds of players fighting in PvP against each other, or participating in a public non-instanced quest together. Another example is a large amount of players performing microtransactions simultaneously. Under the reliability umbrella we are looking for if the game can handle these situations, and will not crash, log you out or something similar, not the performance under these conditions, which is covered by the Performance Efficiency category below.

Operability

The degree to which the product has attributes that enable it to be understood, learned, used and attractive to the user, when used under specified conditions [1]. So what does this actually mean?

Easy of use, learability, attractiveness. Is the game easy to start playing? Can the user understand the game mechanics? Is the user interface understandable? Is the tutorial good?

But this could also include fun factor testing when it comes to game. Is the game actually fun to play? Realism testing could also be included in this category, which includes the believability of the physics engine and many other things. Is the game realistic and immersive to the player? Balancing could also be included here as it can correlate to if the game is fun and understandable.

Performance Efficiency

The performance relative to the amount of resources used under stated conditions [1]. This can include response time, loading time, and different kinds of time behavior. It can also include different types of resource utilization, such as RAM, GPU AND CPU usage. FPS would be connected to that resource utilization.

This also includes performance tests under stressful conditions, such as relative FPS when fighting alongside 10 people in PvP versus 200 people.

Many of these tests also impact how attractive the game is to the user, which we talked about in the operability section.

Security

The degree of protection of information and data so that unauthorized persons or systems cannot read or modify them and authorized persons or systems are not denied access to them [1]. In a single player game without any microtransactions, security is important. But this importance explodes when testing an MMO, MOBA or similar game. Confidentiality, integrity, accountability, authenticity [1]. Security testing requires very specific competence compared to other types of testing, but is essential to modern games. How to battle botting and gold sellers can be included in this category.

Compatibility

The degree to which two or more systems or components can exchange information and/or perform their required functions while sharing the same hardware or software environment [1].

Interoperability is part of this. Support for different game pads, gaming mouses, keyboards, Oculus and Morpheus, and similar gaming paraphernalia. One software system – the game – and one or more hardware systems.

Co-existence between programs. Having Spotify running in the background. Running Team Speak, Ventrilo or other VoIP clients. Two or more different software system co-existing.

Maintainability

The degree of effectiveness and efficiency with which the product can be modified [1]. Many activities included in this category are development activities. But securing testability [5] is definitely something of interest to the game tester.

You could include testing for modification [6] support in this category.

Transferability

The degree to which a system or component can be effectively and efficiently transferred from one hardware, software or other operational or usage environment to another [1].

Here we can test on different hardware and OS. Mobile games need to be tested on iOS, Android and Kindle for example. Then there are different manufacturers of hardware. Different OS versions. Different versions of the hardware from the same manufacturer. For PC there is an unlimited number of configurations. Here the key would be to have enough business information to be able to prioritize different hardware and OS configurations, framed by minimum and recommended hardware requirements.

Heuristics by James Bach [7]

James Bach, one of the most famous software testing experts, has created a list of general test techniques that are simple and universal enough to be applicable in many different contexts, and can also be applied to game testing. This is another way of looking at the test space compared to the categories that I have previously presented. Attacking the problem from a different angle.

Function Testing – Test what it can do

Domain Testing – Look for any data processed by the product.

Stress Testing – Overwhelm the product

Flow Testing – Do one thing after another

Scenario Testing – Test to a compelling story

Claims Testing – Verify every claim

User Testing – Involve the users

Risk Testing – Imagine a problem, then look for it

Automatic Checking – Check a million different facts

To better understand this, look into the reference and read the document.

Testing Tours by James Whittaker [8]

James Whittaker is a professor and software testing evangelist at Microsoft, and has previously worked at Google as well. During his time on Google he developed a way of performing exploratory testing [9] which he called Testing Tours. Testing Tours can also be applied to explore the testing space, but takes yet another different approach in doing so than the two previous approaches.

“Suppose you are visiting a large city like London, England, for the very first time. It’s a big, busy, confusing place for new tourists, with lots of things to see and do. Indeed, even the richest, most time-unconstrained tourist would have a hard time seeing everything a city like London has to offer. The same can be said of well-equipped testers trying to explore complex software; all the funding in the world won’t guarantee completeness.” [8]

So James Whittaker created a number of “Testing Tours” that you could perform to explore the software under test, based on this tourism metaphor.

The Guidebook Tour – Follow the user manual’s advice just like the wary traveler, by never deviating from its lead

The Money Tour – Run through sales demos to make sure everything that is used for sales purposes works

The Landmark Tour – Choose a set of features, decide on an ordering for them, and then explore the application going from feature to feature until you’ve explored all of them in your list.

The Intellectual Tour – this tour takes on the approach of asking the software hard questions. How do we make the software work as hard as possible?

The FedEx Tour – During this tour, a tester must concentrate on this data. Try to identify inputs that are stored and “follow” them around the software.

The Garbage Collector’s Tour – This is like a methodical spot check. We can decide to spot check the interface where we go screen by screen, dialog by dialog (favoring, like the garbage collector, the shortest route), and not stopping to test in detail, but checking the obvious things.

The Bad-Neighborhood Tour – Software also has bad neighborhoods—those sections of the code populated by bugs. This tour is about running tests in those sections of the code.

The Museum Tour – During this tour, testers should identify older code and executable artifacts and ensure they receive a fair share of testing attention.

The Back Alley Tour – If your organization tracks feature usage, this tour will direct you to test the ones at the bottom of the list. If your organization tracks code coverage, this tour implores you to find ways to test the code yet to be covered.

The All-Nighter Tour – Exploratory testers on theAll-Nighter tour will keep their application running without closing it.

The Supermodel Tour – During the Supermodel tour, the focus is not on functionality or real interaction. It’s only on the interface.

The Couch Potato Tour – A Coach Potato tour means doing as little actual work as possible. This means accepting all default values (values prepopulated by the application), leaving input fields blank, filling in as little form data as possible, never clicking on an advertisement, paging through screens without clicking any buttons or entering any data, and so forth.

The Obsessive-Compulsive Tour – Perform the same action over and over. Repeat, redo, copy, paste, borrow, and then do all that some more.

To better understand this, look into the reference and read the document.

Conclusion

This has been my attempt to give a good base for continued exploration of the test space to help new testers or developers think of all the necessary tests they need to run for their game.

This is by no means the only way to tackle this problem, but I usually combine these three approaches when I think about a testing problem.

Unfortunately this is just the first step. The next step is to prioritize the different tests, since running them all is not feasible. But that is a topic for another article.

Johan Hoberg
(source:gamasutra)

分享到： QQ空间新浪微博开心网人人网

上一篇:分析《Clash of Clans》 vs. 《Boom Beach》设计特点

下一篇:游戏设计应遵循标准化的控制方案