阐述POMMS测试系统的定义及工作原理

发布时间：2013-02-06 09:21:46 Tags：POMMS测试系统,公平性,引导,灵活性,管理

作者：Robert Hoischen

目前，独立领域佳作《Minecraft》已经改变大多数玩家对“beta”含义的看法。在听到“beta”一词时，他们再也不会联想到漏洞百出，令人沮丧，支离破碎，且毫无趣味的游戏体验。相反，现在他们将其视为尚未完成的早期开发进程，其中具有高度的可玩价值，且需要修改的方面较少。你可能会疑惑：“那么实际上由谁来操作beta测试呢？”当然，那些在独立游戏开发论坛上自称为优秀beta测试员（游戏邦注：因为他们曾公开测试过《Minecraft》）的人最不可能成为指定对象。

首先我们应清楚某些定义。beta版本是指一款性能完整，但还未修复漏洞，且尚未经过优化的作品。其中可能包含某个小型功能，或是一整套软件集。如上所示，该定义并非一般概念，但你应将其当作无可否认的最大真理。基于此定义，《Minecraft》是个经过完美优化的里程碑，也许在对那些自称为“beta测试员”发布游戏之前，Notch已对其进行过内测与公测。

game testers(from maximumpc.com)

那么，用户心态的改变会对以后游戏测试需求产生什么影响？该答案很大程度上取决于目标用户，但总体上看，那些免费劳动力会逐渐向“纯粹玩游戏”的群体靠拢。

虽然暴雪动视已经找到利用玩家的完美方案，但他们可能会影响到小型或中型测试团队。过高估计他们每个单位时间的工作量可能会令他们快速受挫，从而造成工作延时，最终降低信任度并影响游戏销售。

许多独立团队似乎已解决这个问题，他们并未制定大量计划，而是依靠自个力量测试游戏的各个方面——这是一个可行但并非理想化的解决方案。

但这种省力举措并没有令他们感到惊喜，毕竟他们需投入大量精力创建适当的测试基础设施，找寻适当的测试员，告知不断变更的测试目标。而这也引入本文中心：基于POMMS的beta测试。

该缩略词是指以项目为导向的模块化动机系统（Project-Oriented Modular Motivational System），该术语最初由作家提出，用于剖析文章结构，明确混乱内容。之后由我们的小型独立游戏开发工作室Gamshaft Software更名为beta测试，目前正运用于《Automation：The Car Company Tycoom Game》的测试。该系统实现了在测试过程中融入现代游戏设计元素，其理念通常与“游戏化”有关。基于此系统的测试方法能够解决beta测试中的最大问题：比如提前计划、协调测试员、激发动机、集中注意力、实现测试的灵活性与个性。

POMMS的定义与工作原理

POMMS系统是指：囊括精细确定的子项目，当测试员完成各自目标后，他们会获得相应“得分”。该分数便指代测试员的能力水平，当它超过9000时，那便意味着子项目涵盖的内容过多，或是每个要点的工作量需求较少。

根据子项目的内容，有意义的子项目通常耗时在4到16周期间，其中单个要点大约需要15分钟。在任何给定的子项目中，每个测试员必须至少达到一个最低分数，然后继续beta测试。最高分数得者会获得星星奖励，他可以在各个子项目测试中累计。开发者决定测试员累积多少星星后可以得到的奖励品。比如，赠与特殊权限，或是软件积分中的某个名次，或是论坛特殊身份等。

如果测试员取得的分数高于最低分数，但还未达到星星榜单，那么最终得分会累积到下个子项目中。但它不会算为最低分数，但会被当作达到星星榜单的标准。这有利于不断踏实工作的测试员，因为他们最终将获得一颗星星，当项目接近尾声时，他们会觉得自己为此过程付出一份力。

各个子项目主要关注游戏的一个具体方面，同时保证不要缩小测试员的活动范围。典型例子是在《Automation》中添加V8引擎。在8周的时间内，这些子项目检查了2D/3D图画，测试游戏，平衡游戏场景，收集真实引擎数据，改善漏洞追踪器上的记录，检查所有新字符，最后出色地报告旧漏洞。

在开始测试子项目前，测试员会分获一张具体的子项目计分卡。而且在beta论坛上的“英雄殿堂”路线中各有一份记录帖，它们会随着工作进展不断更新。以下是V8子项目中的计分卡内容。

测试员ID：PaLaDiN1337 (4*)

测试员状态：活跃

目前得分：17(+0)

场景测试： (V0.88) S1， S2， S4，S5，S7，S8， S11

追踪器记录：ID28804129， ID28796925，ID28786633

引擎数据：Ford 4.6L Modular V8 3V

引擎测试：Ford 5.4L Modular ‘Triton’ V8 DOHC

论坛情况：校对工作发挥作用，2p

成就：3x清除漏洞记录，1p

游戏名称应保持简单。测试一个场景，得一分。研究引擎数据，得一分。提交一份漏洞报告，得一分。POMMS系统应容易管理，简单实用，便于理解，且随时可见测试进展。

在逼近尾声时，测试员的相应分数整理为电子表格形式，其中还涉及到他们的当前状态，能力水平，以及目前得分——所有这些都会以表格形式体现，贯穿在不同子项目中。

exaple of POMMS beta administration(from gamasutra)

并非所有活动都可以提前计划或设定，有些任务可能会在开发过程中提出。正因为如此，POMMS系统中设有任务板，详细记录子项目内容，并且会根据英雄殿堂的相关报道不断更新某些部分。必要时候，它还支持快速转换测试重点，促使测试系统动态进行。在此，测试员不必执行标准任务，他们也不是因为成就或得分而完成它们。它们可能类似“清理电子表格”，“找到与主题相关的信息”，或是“额外追加某些优秀的测试员”。

由于测试员也需要生活，因此他们往往可能需要休息，甚至离开。为了协调这方面问题，开发者应该制作规则，允许他们休息，但也要快速排除那些从未开始工作，或压根不在意开发领导指令的人员。

这种“三分法”不仅在原型创建方面发挥作用，而且同样适用那些处于被动状态的新员工，测试员至少应在前三个子项目测试中达到最低分数的三分之一，否则会遭到除名。与此同时，游戏系统的管理相当繁琐；包括更改电子表格中的某个标记，核对所有候选人在子项目三分之一进程中的得分情况。

POMMS系统对开发者的主要好处：

引导：开发者能够估计操作的平均测试量。

收获：能为开发者与测试员营造一个积极的工作环境。

灵活性：沟通与创建新任务更加简便快捷。

管理：便于管理者与用户自我调节。

POMMS系统对测试员的主要好处：

引导：测试员不用猜测接下来的任务，只需选择。

收获：能够清晰估量测试进程与测试员价值。

灵活性：其中任务符合所有测试员的喜好。

管理：系统易于理解，计分卡便于维持

潜在问题

由上可知，POMMS系统中可能存在一个问题，即更加注重测试员的工作量，而非质量。但高度要求某些细节方面可以避免此类问题。比如，漏洞报告必须来自漏洞追踪器上的数据，且至少应标记一个适当标题，设置有意义的标签，详细描述其繁殖过程。

测试员往往只能间接控制质量方面，相关劣质证据通常会提早浮现出来。同时，该系统还强调测试员与开发者之间的必要信任感，直接控制测试员只会影响该系统运作，致使他们一开始便畏惧于犯错，回避自己找到的问题。人人都会犯错；重点是犯错的频繁度而非次数。

另外，POMMS系统的延展性也是个重要问题。通常，其中切换测试系统颇具难度，一旦设置完成，开发者最好确保它不会很快就出现膨胀情况。

至今为止，我们执行的POMMS系统显示其对数延展方式适合小规模测试人员（约50个左右）。据估计，借此管理100个测试员实则是管理10个测试员两倍的工作量，而如果没有采用自动化工具，管理100个以上的测试员难度还会更大。

系统公平与性能

许多测试员认为，获得同样分数会引发激烈竞争。这尤其涉及漏洞报告方面，而它总是遵循先到先得原则。乍一看，这种做法似乎对开发者极其有利，但随着进程加速，最终可能会影响到测试环境的整体品质。一般来讲，向具有较少竞争意识的测试人员提供可替换的活动选择可以解决这个问题。

我们极易忽略系统公平的重要性。基本上，你最新执行的POMMS系统beta测试好比是一款MMO游戏，测试员是参与其中的玩家。甚至那些微小的失衡与不公平都会呈现上万倍的增长趋势，最终开发者会发现测试员的生活遭到严重影响。有时候其中效应极其类似大部分MMO游戏论坛。

关键是确保评分与报告系统110%的公平。不容许任何特例存在，因为如果你搞特殊了，其他人也会要求特权。最初测试员会站在你这边，但如果你并未谨慎处理和维护POMMS系统设置，那么形势会快速发生转变。你最好公开调整规则与结构，解释相关原由。感谢他们的付出，接受双方都可能出错的事实。甚至可以对他们表示直接认可与赞赏，这可能会激发他们的测试动力。

自首次操作6个月以来，这一系统的交果仍相当可观。大约90%的测试员尤为喜爱在POMMS系统框架内工作，余下人员主要抱怨其竞争本质。不幸的是，可能由于创建初期的某些失误，导致这一规则无法赋予所有人（无论他们是否具有竞争意识）公平感。

对比下如今测试团队完成的工作量与之前我们采用的松散式beta测试所完成的工作量，我们发现这一系统的工作效率提高了10倍。目前，测试员心平气和地相互配合，整个工作氛围较为积极，而有时还是会些竞争存在。

总结

总之，以上描述的测试结构对小型与中型软件生产团队都具有极大优势，当然也存在少许劣势。POMMS系统对人力的需求量不大，而且由首次执行可知，其工作效率颇高。（本文为游戏邦/gamerboom.com编译，拒绝任何不保留版权的转载，如需转载请联系：游戏邦）

POMMS: A Way to Get Your Players to Test Your Game!

by Robert Hoischen

The indie elephant in the room, Minecraft, has changed the perception of what a “beta” truly entails in the mind of the average gamer. No longer do people hearing the word “beta” expect the buggy, frustrating, horribly broken, and not-fun-at-all experience they should. Instead it is now viewed as a supposedly incomplete early development milestone with a high degree of playability and quite a bit of polish. “Who does the actual beta testing of these milestones, then?” you ask. Most probably not the people coming to an indie-game dev forum stating that they are qualified beta testers because they’ve “beta-tested Minecraft.”

Let us get some definitions clear-cut first. A beta version is a feature-complete, but not yet bug-fixed, nor polished version of a given working package in development. That might be a single little feature in a game, or a full software suite. As the example above already has shown, this definition is far from being universal, but herein thou shalt see it as the undeniable, utmost truth. With this definition, Notch’s brilliant creation was released in no more (and no less) than well-polished milestones, probably internally alpha and beta tested prior to the release to the self-proclaimed “beta testers.”

How have these recent changes in mentality affected the beta testing requirements of future game-related products? The answer to that question very much depends on target audience, but in general, the freely available workhorses out there drift increasingly towards the “I just play around” category.

While Blizzard Entertainment has found a perfect use for these people, they could become a potentially harmful ingredient in any small- or medium-scale testing team. Overestimating how much (quality) work will be done per unit time and tester quickly leads to frustration and delays, causing, in turn, loss of trust and ultimately, sales.

Many indie teams seem to have solved this issue by not planning much at all and/or testing everything by themselves — a rather workable, but far from optimal, solution.

This blissful ignorance comes as no big surprise – there is a lot of work involved in setting up proper testing infrastructure, finding the right people to do the testing, and communicating the ever-changing aims of the tests. This leads us to the gist of this write-up: POMMS-based beta testing.

The acronym stands for Project-Oriented Modular Motivational System and was incubated by the author as a tool to bring structure and clarity into the chaos that was called beta testing at our little indie game dev studio Camshaft Software, currently working on Automation: The Car Company Tycoon Game. This system implements modern game design elements into the testing process itself, a concept usually connected with the buzzword “gamification.” The POMMS-based approach to testing attacks some of the biggest challenges in beta testing: the ability to plan ahead, tester coordination, tester motivation, testing focus, tester flexibility and individuality.

POMMS: What It Is, and How it Works in Testing

The system itself is quickly explained: Well-defined subprojects are outlined, and a specific amount of “points” is awarded for every task a tester completes towards realizing the subproject goals. This score is the so-called tester power level, and when it’s over 9000, your subprojects are either too long or the amount of work per point too small.

Depending on the contents of a subproject, a meaningful subproject could be anything between 4-16 weeks long, with a single point being the equivalent of around 15 minutes of work. Within any given subproject, each tester has to reach a certain minimum score to continue participating in beta testing. The highest scoring testers are awarded stars that are cumulative across subprojects. It is up to the developers to decide what the rewards for attaining these stars are. For example, they could give testers special privileges, a place in the software’s credits, provide them with special forum statuses, etc.

If a tester achieved more than the minimum score in a subproject, but didn’t reach star rankings, the final score is carried over to the next subproject. This carryover does not count towards reaching the next subproject’s minimum score, but towards the placement in the star rankings. This rewards testers who continually deliver solid work over several subprojects, as they will grab a star eventually, and by the end of the project, they will not feel like they have been anonymous testing machines after all.

Each subproject focuses on a specific aspect of the software, while attempting to not narrow down activities too far. A good example of this would be the addition of V8 engines to the Engine Designer in Automation. Within eight weeks’ time, this subproject encompasses checking 2D/3D artwork, playtesting and balancing scenarios, gathering real-life engine data, polishing up entries in the bug-tracking tool, checking all new strings in the game, and last but not least: good old bug reporting.

At the start of each subproject, a subproject-specific scorecard is defined. Every tester has a single post with this scorecard in the beta forum’s “Hall of Heroes” thread, which each individual continually updates as work progresses. The following is an example of such a scorecard in the V8 subproject mentioned above.

Tester ID: PaLaDiN1337 (4*)

Tester Status: ACTIVE

Current Score: 17 (+0)

Scenario Testing: (V0.88) S1, S2, S4, S5, S7, S8, S11

Tracker Entries: ID28804129, ID28796925, ID28786633

Engine Data: “Ford 4.6L Modular V8 3V”

Engines Tested: “Ford 5.4L Modular ‘Triton’ V8 DOHC”

Forum Excellence: “Proof-reading helps, 2p”

Achievements: 3x “Bug-tracker clean-up, 1p”

Keeping it simple is the name of the game. Tested a scenario? Get a point. Researched engine data? Get a point. Posted a bug report? Get a point. The system needs to be painless to administer, simple to use, easy to understand, and progress readily visible.

At the end of each subproject, the Hall of Heroes thread is locked and scores are transferred to a spreadsheet that lists all testers, their current status, and their power levels, along with carry-over scores — all displayed as a timeline throughout different subprojects.

Not all activities can be planned for, or set up well in advance, and some tasks may pop up as development chugs along. For exactly this purpose there is the Quest Board — a subproject-specific, continually updated section attached to the subproject announcement post in the Hall of Heroes thread. It allows for quick shifts in testing focus if required, and makes the testing system dynamic if need be. Here, testers will find non-standard tasks that yield achievements and points when completed. These tasks could be things like “clean up this spreadsheet for us,” or “find some information on this topic,” or maybe even “find additional testers that don’t suck.”

Testers do have a life — or at least sometimes pretend to — so more often than not, they may need to either take a break from testing or even retire. To accommodate this, there are passivity rules that allow for taking breaks, but quickly get rid of people that either never get started working or decided to not care anymore without notifying the development leads.

The “Rule of Thirds” does not only do well in photography but also here: being new to the team, coming from a passive status, or a failed active status, testers need to achieve at least one-third of the subproject’s minimum score within the first third of the subproject to not be expelled. The administration of this system is rather trivial, too; changing a single flag in a spreadsheet, and checking the corresponding score cards of all arguable candidates one-third into the subproject.

The key benefits of POMMS to the developers:

Guiding: Developers can gauge and plan how much testing will be done on average.

Rewarding: The system creates a positive working environment for both devs and testers.

Flexible: The communication and the creation of new tasks is simple and efficient.

Manageable: The system is easy to administer and user participation is self-regulating.

The key benefits of POMMS to the testers:

Guiding: There is no guessing what things to do next, only options.

Rewarding: There are clear measures of progress and the tester’s value to the team.

Flexible: There are always tasks that fit all kinds of tester preferences.

Manageable: The system is easy to understand and the scorecard simple to maintain.

Potential Problems & Scalability

A potential concern with POMMS as presented above would be that testers are rewarded only for quantity, but not quality. This is averted by having very high minimum requirements placed on what counts as a point. For instance, a bug report must be filed in the bug-tracking tool and contain at least a proper title, meaningful tags, and a detailed description of how to reproduce it.

The quality of the work done by the testers is only controlled indirectly — evidence of shoddy work will often surface sooner rather than later. This system also reinforces the much needed trust between the testers and the developers, where direct control would only hurt the system as people start to be afraid of making errors instead of posting what they find. We all make mistakes; it is the density of mistakes that matters, not their absolute number.

Another important issue is the scalability of the system. Switching between testing systems is a difficult and tedious prospect, and once a system such as POMMS is set up, the developers have to be certain that it is not outgrown any time soon.

To date, our implementation has shown a very favorable, almost logarithmic scalability for small numbers of testers up to about 50. As an estimate, managing 100 testers is about twice as much work as having only 10 testers, although many more than 100 testers would not be easily manageable without automated tools.

System Fairness and Performance

Many testers battling for the very same points can lead to pretty stiff competition. This is specifically the case for focused bug reporting, which always works on a first-come, first-served basis. At first glance, this seems to be a good thing for the developers, as progress is rapid, but ultimately may harm the overall quality of the testing environment. This problem can generally be circumvented by always providing rewarding alternative testing activities via the quest board for less competition-minded people.

One easy-to-miss point is the immense importance of system fairness. Basically, your freshly implemented POMMS beta testing behaves like an MMO, and the testers are its players. Even the tiniest imbalances and unfairness will surface, multiplied by 10k in magnitude, to then be thrown into the developers’ faces in a way that would suggest the tester’s life just has been destroyed. At times it is scarily similar to the average MMO forum.

The crux is to make sure the scoring and reporting system is 110 percent fair. Make no exceptions to rules, because if you do, everyone else will cry for exceptions too. Initially, your testers will be on your side, but that can change very quickly if you don’t handle the set-up of the system and its maintenance with great care. Always be transparent with any changes to rules or structures, and explain the why and when. Show appreciation for the work people do and accept that things can go wrong on both sides. Even minor direct and personal acknowledgement and appreciation from the devs toward individual testers can work wonders for motivation and the testing climate.

In the six months since our first implementation, the conclusion that can be drawn so far is very positive. About 90 percent of the testers very much like working within the framework of POMMS, while 10 percent mainly complained about its competitive nature and left. This is unfortunate, and probably due to several mistakes that were made during the setup phase, before the rules were fair to both competitive and non-competitive testers.

Comparing the amount of work that is done by the testing team now to how much was done during the time we ran an unstructured beta, we have effectively multiplied productivity by 10.

Efforts are coordinated without much shouting, and the general atmosphere is positive albeit a bit competitive at times.

Summary and Conclusion

In summary, the herein described testing structure comes with huge advantages for small and medium scale software productions, entailing only few complications. It does not require a lot of manpower once set up, and has proven to be very efficient already in its first implementation.(source:gamasutra)

分享到： QQ空间新浪微博开心网人人网

上一篇:分析免费模式vs.付费模式所存在的优劣

下一篇:分析《Temple Run 2》收益及下载量排名情况