阐述小组讨论、游戏测试及参数分析之间的关系

发布时间：2011-10-09 17:24:20 Tags：参数分析,小组讨论,易用性测试,游戏测试

作者：Tristan Donovan

谈到研究方法，小组讨论方式受到颇多轻视和嘲弄。相比其他研究方式，小组讨论更快速、平价和普遍。其适应性强，在收集高级概念评价和产品细节方面作用相当著，虽然它们缺少易用性测试的系统定量数据。

但这依然让大家又爱又恨的研究方式。游戏行业也不例外。就像游戏易用性测试公司Vertical Slice总监Graham McAllister 7月份在布莱顿开发大会说的那样，“小组讨论就是一个坏词”。

他并非独自一人——业内许多成功人士纷纷批判小组讨论的“智慧”。任天堂忽略其小组讨论的反馈信息，继续推进NES在美国的发行。

《模拟人生》克服来自其小组讨论的激烈批评声，一举成为有史以来规模最大的PC游戏。即使到现在，随着行业规模越来越大，越来越专业，大家对小组讨论依然持怀疑态度。

《Magic: The Gathering-Duels of the Planeswalkers 2012》开发商Stainless Games执行总监Neil Barnden表示，“我的感觉是，游戏开发越来越像是集体设计，逐渐稀释开发者的精神。”

“我在Stainless的主要职责是推进新款《恶煞车手》，从本能直觉来说，我最不愿意做的是通过小组讨论测试游戏反应。我们不希望基于随机用户群的测试结果调整游戏构思。”

但由于游戏吸引广泛用户群体（游戏邦注：而非仅是过去那些持有掌机设备的男性玩家），有人觉得我们有必要就游戏进行小组测试。《体感动物》工作室Frontier Developments创始人David Braben表示，“在游戏行业，我们习惯根据自己的角度创作游戏。但只要用户开始包含除你以外的群体，你就需要查看他们的反应。”

“Gamasutra读者都是游戏知识渊博的人士。这很好，有许多游戏瞄准这些群体，但若你想要覆盖其他每周只玩1、2次游戏的玩家，你就需明白他们也许会持有不同心态。”

Frontier 2000年初就开始使用小组讨论方式，但从未回头进行思考。

focus wallace from gamasutra.com

他表示，“讨论小组废话连篇，特别是在政党中，其中人士将其视作推卸责任的方式。这包含许多嘲讽意味。但你需确保大家不会感到困惑，能够专注于用户界面或说明，以及游戏未使用流行词汇。”

Frontier的回应方式是：一段时间后显示信息建议玩家抬头看树木，从而解决谜题，然后再通过另一由各类孩子组成的谈论小组进行测试。Braben表示，“有趣的是，有2/3用户满意谜题设置，而有1/3未像信息所指示的那样做。”

“我们随后询问他们为何不遵照谜题所述那样做，他们表示‘因为它叫我这么做’。问题是他们喜欢做其他事情，但他们随后就会评论称游戏很无趣，因为他们能做的事情不多——只是走来走去收集东西，没有继续谜题。”

第二个小组讨论结果促使游戏做出调整。Braben表示，“我们调整信息：‘无论如何，千万不要爬到树上去——这很危险’。随后第二讨论小组的所有小孩却都这么做，觉得游戏非常棒，因为有很多任务要完成。”

Braben表示，提出这些问题是要说明小组测试对游戏的重要性。他表示，“我觉得若不是亲眼所见，我很难注意到此逆反心理。一大悲剧是，你创作出颇引以为豪的游戏，将其发行，然后听到朋友或亲戚说他们一开始就被困住。通常是这些小问题令用户纷纷流失。”

另一个令Frontier更意想不到的发现来自《体感动物》开发期间，小组讨论会议指出，美国小孩投掷方式同英国小孩不同，这可能源自棒球在美国的流行和板球在英国的风靡。

Braben觉得小组讨论敌对情绪主要是印象问题，若游戏开发者能够忘掉这些粗话，情况会好很多。他表示，“不妨排除‘集中’一词，仅将其称作‘测试’，这么做才是明智之举。”

“集中是置于测试之前的修饰词，起到巩固作用。若你无法在潜在用户中测试或验证构思，那么你就错失一个技巧，因为及早发现问题非常重要。”

小组讨论的有效性通常也涉及其运用方式。专业研究机构存在优势，尤其是他们在管理小组讨论及避免讨论受个人主导方面经验丰富。

但Braben表示，“最好还是进行广泛测试，而不是付出更多代价开展1、2次测试”因为人类存在差异性（游戏邦注：就如《体感动物》所示）。他表示，“因为美国是个大国，其各个角落都存在各种不同声音，欧洲亦是如此——伦敦的小组讨论结果也许在德国行不通，两地存在不同情感认识。”

focus kinctimals from gamasutra.com

另一建议就是在不同群体中测试游戏的不同启动方式。Braben表示，“我们通常以2-3种方式执行内容。在不同群体中测试同种方式，查看他们的反应。这非常有效，因为能够同时观察各种不同技巧，是让玩家是立即获得内容，还是需要挣扎20分钟。”

但既然互联网已可令游戏公司能够收获众多有关玩家实际游戏行为的数据，依靠参与者记忆和诚实的小组谈论是否是个过时观念？假设在线游戏公司Bigpoint设立数据分析特别小组，宣扬其数据挖掘的广泛性，那么称该德国巨头无需小组讨论则属合理推断。

但事实恰恰相反。该公司（游戏邦注：Bigpoint主要瞄准免费游戏）首席游戏主管Philip Reisberger表示，“Bigpoint的基本原则是倾听社区声音，但相信自己的数据。我们收集很多信息，但我们很喜欢小组讨论方式，因为这能够提供即时反馈，你能够获悉用户的反应。电子表格中的数据仅仅是数据。若其上升5%或3%，真实太好了，但更重要的是分析变化背后的原因。”

其实Reisberger觉得小组讨论对Bigpoint这样的在线游戏供应商来说意义更大，远胜过盒装零售游戏发行商和开发商。他表示，“我们是个服务公司，而非产品公司，所以你只有同用户沟通，你才能知晓服务内容如何被用户消费，用户怎么看待这些服务。”

就连专注定量数据的游戏研究机构都认为参数只是测试谜题的一部分。游戏分析公司Playmetrix首席技术官Justin Johnson表示，“参数是免费信息，你对目标用户的体验方式认识越高，所处的位置就越有利。”

他表示，参数不仅能够阐明进展情况，而且还能够显示什么时候的小组讨论意见值得参考，什么时候的信息缺乏可信度。他表示，“你依然能够影响大家发表的意见，但有时你所获的信息同游戏实际情况相反，如用户在做什么，没有做什么，这些信息你无法根据调查或小组讨论获悉。

Reisberger表示不要将小组讨论结果视作真理非常重要。他表示，“若我倾听社区意见，我就会创造社区用户喜爱的游戏机制，但我们不能只创造用户喜欢的作品，游戏还需要能够创收。”

Braben表示，小组讨论存在的另一问题是用户行为。他表示，“若你设定一个小组，定会有1-2个人会左右小组其他成员的看法，这有些令人沮丧。其中可能包含某些多嘴之人，坚持己见之人（游戏邦注：尤其是在非正式小组谈论中），他们会指挥他人如何行事。这其实就让小组讨论失去意义，因为你需判断大家发表的是否都是自己的意见。”

Vertical Slice的McAllister补充表示，如今的普遍问题是小组讨论受到滥用。他表示，“小组讨论有其价值，它们在某种程度上受到滥用。小组讨论有利于收集信息，但有些人以小组讨论代替易用性，用户体验，或游戏测试，而这涉及用户行为。”

虽然Vertical Slice在游戏开发早期阶段使用小组讨论，但将生物统计和面谈结合起来是游戏制作后期测试工作的更好方案。就Vertical Slice来说，这包含观察用户体验游戏时的眼部运动，面部表情和流汗情况，然后进行体验后的面谈，弄清用户身体为何会在游戏不同阶段出现特定变化。

McAllister表示，“进行面谈时，玩家已出现某些行为变化，我们需要进行证实或确认。例如皮肤电反应，我们会看到小高峰，但不知道为什么。其中信息需由玩家提供。”

Vertical Slice等公司进行的易用性测试克服许多小组讨论的局限，但为何说Barnden在此研究方式中的感悟是朝集体游戏设计更进一步？难道小组讨论、易用性测试和在线数据的反馈信息不是逐步中立化游戏设师，促使游戏作品越来越偏向中间立场？

McAllister表示，告知游戏设计师如何开展自己的工作不是研究人员的职责。“我们并非试图告诉你如何进行游戏设计。我们在分析用户方面是专家；他们在制作游戏方面是专家。其中关系是‘开发者如何借助这些数据更好制作游戏？’”

Braben补充表示，牢记谁处在管理位置也非常重要。他表示，“你依然处在控制之中。你可以忽视信息，但要自负后果。若你认为这是在不同群体中试验游戏构思，查看其反应，我们就会这么做——即便是为我们自己创作游戏，因为你向伙伴进行展示。最后，最主要的小组讨论是在发布预告片时，但这是个高风险小组讨论，因为此时你已没有退路。”（本文为游戏邦/gamerboom.com编译，如需转载请联系：游戏邦）

Focus Groups, Testing, And Metrics: Developers Speak

by Tristan Donovan

[Developers and testing firms discuss the merits of different kinds of testing and analytics, and Gamasutra investigates whether focus groups really have become, as has been suggested by Vertical Slice's Graham McAllister, "the f-word."]

When it comes to research methods, few invoke so much distain and ridicule as the focus group. Compared to many other forms of research, focus groups are quick, cheap, and widely used. They’re also adaptable, being as useful for gathering opinions on high-level concepts as the particulars of a product, even though they stop short of the systematic quantitative data gathering of usability testing.

Yet they remain very much the research method everyone loves to hate. And the game industry is no exception. As Graham McAllister, director of game usability testing firm Vertical Slice told the Develop conference in Brighton back in July, “Focus groups are the f-word.”

He’s not alone — there are plenty of celebrated examples of game industry successes that snubbed the “wisdom” of focus groups. Nintendo ignored the feedback from its focus groups, and pushed ahead with the launch of the NES in the U.S. anyway.

The Sims overcame intense criticism from its focus groups and became the biggest PC game of all time. Even now, as the industry becomes ever bigger and more professional, there’s a strong distrust of the focus group.

“My instinct about it is that it feels like it gets more and more like a process of designing by committee, and diluting the spirit of the developer,” says Neil Barnden, the executive director at Magic: The Gathering – Duels of the Planeswalkers 2012 developer Stainless Games.

“My main raison d’être at Stainless is to push forward the new Carmageddon, and my gut instinct is that the last thing I would want to do is to use focus groups to test reactions to that. We certainly wouldn’t expect to improve our vision for the game by having it tested on a random selection of people.”

But now that games appeal to a true cross-section of society and not just the console-owning males of the past, some think the need for focus group testing of games is growing. “In the game business, we’re very used to writing games for ourselves,” says David Braben, the founder of Kinectimals studio Frontier Developments. “But as soon as your audience includes people who aren’t you, you absolutely have to see how they react to it.”

“Everyone who reads Gamasutra is already massively knowledgeable about games. That’s fine, and a lot of games target our group, but if you want to cover people who don’t play games every single waking hour but only once or twice a week then you have to be aware that they might have a different mindset.”

Frontier started using focus groups in early in the 2000s and hasn’t looked back since.

“There’s lots of rubbish spoken about focus groups, particularly in relation to political parties, where people see them as a way of abdicating responsibility for something. There’s a lot of sneering,” he says. “But you have to make sure people don’t get confused or hung up on the user interface or the instructions and that the game is not using buzz words.”

While focus groups have helped shape many of Frontier’s releases, one example stands out for Braben. “When we did our first Wallace & Gromit game, we did an informal focus test ourselves, because we wanted to make sure that it worked well for kids,” he says. “It was astonishing because half didn’t understand how to overcome a puzzle.”

Frontier responded by adding a message that appeared after a set period of time to advise players to look up a tree to solve the puzzle before putting to the test with another focus group made up of different children. “What was interesting was that then something like two-thirds were then fine with the puzzle, but a third of them didn’t do as the message suggested,” says Braben.

“We asked them afterwards why didn’t they do what the message said, and they said ‘because it told me to do it’. The problem was they said they were happy doing other stuff, but then wrote that the game was dull because there wasn’t much to do — because they just went around collecting stuff and not proceeding with the puzzle.”

The results of the second focus group led to a further tweak to the game. “We changed the message to say ‘don’t, whatever you do, go up the tree — it’s dangerous’. Then every single kid on the following focus group did it and said the game was great because there was tons to do,” says Braben.

Braben says picking up issues like this is why focus testing matters for games. “I don’t think we would have picked up on that bit of reverse psychology without having seen it in the wild, so to speak,” he says. “One of the tragedies is when you’ve written a game that you are really proud of and then you roll it out and hear from a friend or relation that they got stuck right near the beginning. It’s often the tiny things that put someone off.”

One of Frontier’s more unexpected findings came during the development of Kinectimals when the focus group sessions highlighted that American children throw differently from British kids thanks — presumably — to the popularity of baseball in the U.S. and cricket in the UK.

Braben feels that the hostility to focus groups is largely an image problem and it would be better if game developers just forgot the f-word. “Drop the word ‘focus’, just call it ‘testing’, because then you begin to look like an idiot for not doing it,” he says.

“Focus is a fancy word put in front of the word testing to validate it. If you don’t test or validate ideas with the people you think will like your game then you are missing a trick because catching problems early is really important.”

The usefulness of focus groups is often about how they are used, too. Professional research agencies offer advantages, not least their experience of managing focus groups and dealing with the danger of individuals within the group dominating the discussion.

But, says Braben, it is often “better to do more tests than just one or two at greater expense” because — as his Kinectimals example shows — people are not homogenous. “Because the States is such a big country, you find there are different opinions in different parts of it, and similarly in Europe — a focus test in London might not be valid in Germany, where there are different sensibilities,” he says.

Kinectimals

Another tip is to test different implementations of a game with different people, rather than the same group. “We often implement something in two or three different ways, and put different groups of people in front of each approach and see how they react,” says Braben. “That is very useful, because you can see the different techniques side-by-side, and whether people got it instantly or floundered for 20 minutes.”

But now that online connectivity is allowing game companies to harvest vast quantities of data about how players actually behave within games is the focus group with its reliance on participant memory and honesty a dated idea? Given that online games firm Bigpoint employs a dedicated team of data analysts and boasts about the enormity of its data mining operations, it would be a fair assumption that the German giant would have no need of focus groups.

But the truth is rather different. “The basic principle at Bigpoint is listen to your community, but believe your numbers,” says Philip Reisberger, the chief games officer at the free-to-play specialist. “We are gathering a lot of data, but we like focus groups a lot because they provide you with instant feedback and you see the reaction of the people. A number on a spreadsheet is just a number. If it goes up 5 or 3 percent, that’s something, but there’s intelligence behind that change.”

In fact, Reisberger feels that focus groups might actually be more important for an online game provider like Bigpoint than for publishers and developers of boxed retail games. “We’re a service business, not a product business,” he says. “So understanding how the service is consumed and viewed by the user you only see when you talk to them.”

Even game research agencies that focus on quantitative data agree that metrics are just one part of the testing puzzle. “Metrics are complimentary,” says Justin Johnson, chief technical officer at game analytics firm Playmetrix. “The more awareness and understanding you can get about how your target demographic is playing your game, the better shape you are going to be in.”

Metrics, he suggests, can shed light not just on what’s happening but also on when to trust and to distrust the opinions gathered from focus groups. “You can still give a lot of sway to what people say, but sometimes you get a bit of contrast with what’s actually happening inside the game such as what they’re doing and what they’re not doing, which you might not get from a survey or focus group,” he says.

Reisberger says it is important that the results of focus groups are not taken as gospel. “If I was to listen to my community, I would have game mechanics that the community loves, but we don’t only seek to do games that people love, but games that should monetize,” he says.

“You might listen to the community and they will say, ‘We don’t like clicking this 50 times.’ If you just listen to them you would have it as a one-time click.

“But, since you want to monetize it, you could think outside the box, and say let’s monetize this thing so that we sell you, or you could earn, a virtual assistant to do that for you. If you follow the focus group blindly, you could design a game the community loves, but does not monetize.”

Another problem with focus groups is human behavior, says Braben. “If you have a group, one or two individuals will tend to lead the group, which is very frustrating,” he says. “You can have some very loudmouth, opinionated people, particularly in informal focus groups, who will tell others how to do things. That actually invalidates the point of the exercise, as you’re trying to find out if each individual would have got it on their own.”

Vertical Slice’s McAllister adds that a common problem is that focus groups are misused. “There is value in focus groups, they’re just misused in some ways,” he says. “Focus groups are very good at gathering opinion, but some use focus groups in place of usability, user experience, or play testing, which is about people’s behaviour.”

Although Vertical Slice does run focus groups on games that are in the earliest stages of development, the firm sees a marriage of biometrics and interviews as a better solution for testing games at later stages in the production cycle. In Vertical Slice’s case, this involves monitoring people’s eye movements, facial expressions and sweating as they play and conducting a post-play interview to shed light on why their body responded in certain ways at different points in the game.

“When doing the interview know that something’s happened that we want to validate or identity,” says McAllister. “With galvanic skin response, for example, we might see a spike but we won’t be sure why. That information has to come from the player.”

The kind of usability testing done by companies like Vertical Slice overcomes many of the limitations of focus groups, but what of Barnden’s gut feeling such research is a step towards game design by committee? Doesn’t the feedback from focus groups, usability tests and online data simply pave the way for neutering game designers, and ensuring that games become more and more middle of the road?

McAllister says it’s not the job of researchers to tell game developers how to do their job. “We do not try to tell you how to do game design,” he says. “We’re experts in understanding people; they’re experts in making games. The relationship is ‘How can developers use this data to make the game better?’”

It’s also important to remember who is in charge, adds Braben. “You’re still in control. You can ignore the information, but you do so at your own peril,” he says. “If you think of it as just running ideas past different people and seeing how they respond then we all do it — even when writing a game for ourselves, because you show it to your mates. Ultimately, the biggest focus group is when you release previews, but that’s a high-risk focus group, because there’s no going back then.”（Source：gamasutra）

分享到： QQ空间新浪微博开心网人人网

上一篇:开发者分享游戏试玩版本的发布经验

下一篇:关于持久世界vs.静态前进机制的设计辩论