游戏邦在:
杂志专栏:
gamerboom.com订阅到鲜果订阅到抓虾google reader订阅到有道订阅到QQ邮箱订阅到帮看

如何在玩家测试实验中收集生理数据

发布时间:2013-12-02 14:31:43 Tags:,,,,,

作者:Isla Schanuel

为什么要追踪生理数据?

你刚完成游戏的初版测试。在用户测试阶段,你希望能够收集一些有效的信息,这些信息是关于玩家完成关卡的真实能力、游戏过程中的沉浸感以及他们是否确实感受到乐趣。

第一阶段的工作是相当直接的量化评估。你可以通过记录游戏和追踪游戏指标如通关所需时间和遇到的错误的数量及性质,来获得你所需要的游戏相关的信息。但后面的工作就比较困难了。

传统的方法

如果你想知道别人对你的产品有何看法,传统的方法是问卷调查,让对方写下答案,然后寄希望于对方是有能力、诚实、记性好且语言能力过关的人。

传统的方法主要包括:

问卷调查:你可以问大量关于不同事情的问题,但你得到的答案都是事后回忆。人们可能忘记他们在游戏过程中经历的事,所以你必须考虑到确认和回忆偏差,这可能会严重影响数据的准确性。

直接观察:非常适合直接看人们如何玩游戏,但因为你就站在旁边,受试者的游戏发挥可能因此受到影响。

口头报告:游戏前、游戏中和游戏后都可以用;通常以交谈的形式进行(游戏邦注:有组织的或无组织的),或要求受试者叙述自己游戏时在做什么。这种方法可能遗漏有价值的信息。

以上都是收集关于游戏操作和用户体验的信息的方法,我不打算争论它们的有效性,但我得说,传统的用户测试方法有两大局限性:

主观性。任何时候你问受试者,他们的回答都完全以主观体验为根据,因此,不可能与其他人的回答形成可靠的对比。

不可量化。确实有可能对问卷结果进行统计分析,且你完全可以说“10个人中有9个人说他们觉得有趣”,但这不是可靠的实验,例如,口头报告不可能告诉你某人在A点比在B点时情绪绝对更亢奋;直接观察不可能告诉你连续向右转圈一小时的人是否越来越焦虑。

为了获得这些信息,我们必须求助于生理测定学。

EEG(from empowher.com)

EEG(from empowher.com)

生理数据与用户测试实验相结合

说到电子游戏的用户测试,大部分实验可以归为以下两类:

1、生理数据被用于补充传统数据。

在这类实验中,生理数据被当作关于受试者的心理状态的额外信息源。理想的情况下,这意味着到你的实验末尾,你可以知道如“大部分玩家在游戏过程中出现亢奋的反应。这得到了玩后访谈中收集到的信息的进一步支持。因此,我们可以肯定地认为,这个关卡对玩家来说是有趣的。”

2、生理数据被用于验证传统数据。

与上述方法不同,这种“具体化”实验分成三个阶段,在过程中收集到的信息被对应到玩家测试的时间线或录像中,然后用于指导测试后的数据收集,即确认在游戏过程中出现的重要时刻/事件。例如,“在这时你的心跳速度明显加快。你记得你当时在想什么吗?”这种方法在确认重要事件方面特别有效,根据Pejman等人的报告,这种方法比单纯的观察多发现了63%的问题。

在这两种用户测试实验中,生理数据都是很有价值的,不仅因为它是关于玩家心理状态的额外信息来源,还因为这些相同的记录可以用于指导和修改传统方法,以便获得比单独使用传统的方法的更准确、更全面的数据。

如何收集生理数据?

现在我们已经知道为什么要在玩家测试实验中收集生理数据,接下来我们再看看如何收集。

这个过程的第一步是确定你想收集的数据类型。不同的监测设备有不同的优势和劣势。例如,测量某人握控制器的力度可以人追踪此人的亢奋程度,但不能告诉你玩家的具体心绪状态。你使用的设备的实际大小和形状也各不相同,它们对游戏操作和玩家体验有潜在的影响。

生理监测法例举

以下是我比较熟悉的已用于玩家测试实验的技术,且已经被证实确实能产生可靠/有效的数据。当然,我所例举的方法肯定不全面。

脑电图描计法(Electroencephalography,EEG)

其实我在我的另一篇文章中已经介绍过这种技术了。EEG的原理是,使用一种叫作EEG帽的声纳网记录头皮的电信号。这种技术已经投放市场了,被用于头戴设备和如Nekomimi这样的玩具。EEG在识别游戏事件方面被证明相当实用,但侵入性强,要价高,费时间。另外,用这种技术收集到的数据很难解读。

适用于:监测注意力/厌倦,可以用于识别一般的行为模式以及确认可能触发注意力变化的游戏事件。

EEG_children(from neurogadget)

EEG_children(from neurogadget)

眼球跟踪和眼动描记法(Eye Tracking和Electrooculography,EOG)

眼球跟踪可以让你知道玩家在游戏过程中在看什么以及他们的眼球运动有多快。目前,最常用的方法是使用摄像眼球跟踪系统。

第二种方法,EOG测量视根据眼球方向改变的网膜的静息电位。EOG测量法已经被用于捕捉演员的眼球的位置。它的额外优点是完全无侵入性,因为所使用的电极不会影响受试者的视野。

另一方面,缺少标准的电极排列意味着很难把你得到的结果与其他研究者的作比较,信号自身可能被人工产品影响,且这整个设备设置要求比其他方法更高的样本率。

适用于:测定玩家在看哪里、看了多久,识别干扰区域和玩家视觉上判断游戏元素的能力。不幸的是,注视不能代替“注意力”,也无法告诉你太多玩家看的范围(在3D环境中)的信息。

肌电描记法(Electromyography,EMG)

EMG记录由骨骼肌产生的电活动。面部EMG尤其管用,因为它可以追踪产生如微笑或皱眉等面部表情的肌肉,从而让你知道受试者在游戏过程中的情绪状态。

可惜,即使把传感器放在受试者的脸上完全不会让他们感到难受,但他们无法说话。你还必须处理好正常记录的人工误差,这样你才不需要再通过受试者的发言得到更多信息。

适用于:被当作测量情绪的替代法,因为它能够帮你捕捉包括产生面部表情在内的肌肉活动。

皮肤电反应(Galvanic Skin Response,GSR)

皮肤电反应/皮肤电传导是测量皮肤的电传导能力的方法,皮肤的电传导能力会随着皮肤的湿润程度而改变。GSR被用于亢奋反应的指示,因为你的汗腺是由交感神经系统控制的。

这个方法的潜在问题来源于几个方面。设备所处的环境的温度和湿度对记录有重大影响,且比较不同次测试的记录结果是想当困难的。内在因素,无论是生理上的还是心理上的,都可能导致记录偏差,或完全缺少重要的变化性(这取决于样本)。

适用于:监测亢奋/压力,识别游戏内可能导致压力上升或亢奋的情境(不是一次性的事件)。

心跳反应(Cardiac Responses)

对于测量亢奋反应,可以追踪的心跳反应事实上有许多种,包括:心跳间隔、心跳速率、心率变异性和血压。这些方法的变体取决于要追踪的是什么。例如,指标如心率变异性和血压收缩压是“付出努力”的可靠指示,可以用于检测心理负荷的即时变化。这些方法的局限性在于,依赖所使用的技术,且正如其他指标,个性差异可能非常大。

适用于:发现玩家对游戏事件的即时反应、监测亢奋,可能还可作为测量受试者的心理负荷的替代法。

呼吸法(Respiration)

呼吸确实可以敏锐地反应心理负荷和精神状态的变化,呼吸也是少数几乎人人都不需要太多力气或训练就能有意识控制的活动之一。因此,虽然它容易测量,但可能不如上述其他方法那么实用。另外,与面部EMG一样,在你监测受试者的呼吸时,他们不能说话。

适用于:测试玩法可能包含肢体活动作为输入法的游戏。

设计实验

你知道需要收集什么信息后,就可以开始设计你的实验了,此时你要考虑以下问题:

1、我们是否肯定我们的实验管用,是否正在寻找一般的输入和信息,或者是否在做真正的游戏测试实验?

玩家测试不是品质控制。质量控制是并且总是当作一个独立的活动来执行的。它有特定的目标,做它的人必须非常熟悉他们所测试的游戏,这些人不可能提供我们在其他两种情况下寻找的那种信息。

如果我们寻找的是反馈,那么你就要收集尽可能多的关于游戏的信息。你的目标是找到任何可能有趣的、有问题的或值得扩展的东西。不要只问关于玩法的问题(“清楚目标了?”),也要关于情感体验的。

然而,如果你想测试某些特殊的东西,某些你可以用作真实的测试变量的东西,那么你就要做一次真正的实验且相应地执行。力求一致性、可重复性,一次只改变一个变量。

2、如何使用我们收集到的生理数据?

确定你什么时候要使用这些信息。你收集到的信息可以用作测试后分析的材料,可以立即回放的录像数据用于指导游戏后的访谈,等等。(……或者某些你突发奇想的方法)

3、如何控制监测设备的影响?

设备可能会干扰受试者和影响他们的游戏表现,虽然有些人可能完全不受影响。取决于所使用的方法,你可能觉得没有必要控制监测设备。

但如果你确实想控制,你可以做三个实验组:一个不收集生理数据,一个不收集传统数据,一个两种数据都收集。或者,如果你的时间和资源都允许,那么你可以把监测设备结合到一般的输入/游戏硬件当中。

总结

把生理监测法给合到玩家测试和研究中,可以显著地影响所收集的数据的质量和数据。如果你正在制作一款产品或设计一种服务,尽早且尽量经常地收集生理数据几乎与执行品质保证和用户测试一样重要。确保你明确自己在找什么信息、你收集那些信息所使用的方法有什么优点和缺点,这样你才能设计出合适的问题和实验。(本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦

Integrating Biometrics: A Rough Guide to Monitoring Methods

by Isla Schanuel

Why would we want to track physiological data?

You’ve just finished the alpha version of your game. During the user-testing stage, you would want to be able to collect information about the players’ abilities to actually complete the level, as well as get some idea of how engaged they were while they were playing (and whether or not they were actually having any fun).

That first part is fairly straightforward quality assurance. You can capture all the information you need about in-game activity just by recording the game and tracking in-game metrics like the amount of time taken to complete the level and the number and nature of any bugs encountered during those trial runs. It’s the latter bit where things get tricky.

Traditional Methods

Traditionally, if you wanted to get an idea of what people thought about something you made, your options for getting that information were primarily based on asking questions, writing down the answers you received, and praying your testers were competent, honest humans with good memories and decent language skills.

Examples include:

Surveys – Allow you to ask a lot of questions about a lot of different things, but it’s all after-the-fact. People may forget to bring up things they noticed during gameplay, and you have to deal with things like confirmation and recall biases, which can mess up the accuracy of your data.

Direct observation – Which is great for seeing how people interact with the game, but you’re physically around, and that’s going to affect how people play, and what they do.
and

Verbal reports -  Which, can be done before, during, and after gameplay; and usually take the form of interviews (either structured or unstructured), or by asking people to narrate what they’re doing as they play. Again, though, you may end up with people leaving out valuable information.

All of these are fantastic methods of gathering information about gameplay and user experience, I’m not going to debate that, but I will say that traditional user-testing methods suffer from two major limitations.

They are not objective. Any time you ask someone a question, the answer you receive will be based entirely upon that person’s subjective experience, and as such, cannot be reliably compared to that of any other person.

They are not quantifiable. Yes, it is possible to perform statistical analysis on survey results, and yes, you could totally say “9 out of 10 players said they had fun”, but there is no way a talk-aloud trial, for example, could tell you a player was definitively more aroused at point A than point B, nor would direct observation be able to tell you if that one guy who kept turning right for an hour and sending himself in circles was getting more or less agitated over time.

To get this information, we have to turn to biology.

Integrating Physiological Data Into User Testing Experiments

When talking about video game user testing, most of the experiments to date have fallen into two categories.

1. Experiments in which physiological data is collected in addition to traditional data.

In these experiments, physiological data is treated as an additional source of information about subjects’ mental states. Ideally, this means that at the end of your experiment, you will be able to say things like “The majority of players showed signs of heightened arousal during gameplay. This is further supported by information collected during post-gameplay interviews. As such, we can confidently assume that this particular level is engaging and fun for users.”

2. Experiments in which physiological data is used to shape traditional data-collection.

In contrast to the “Traditional plus physiological data” experiments described above, these “shaping” experiments are a three-phase process in which data is collected, mapped to a timeline or recording of the users’ test run, and then used to guide post-testing data collecting by identifying critical moments/events during gameplay. E.g. “Your heart rate increased dramatically at this point. Do you remember what you were thinking at that time?” This particular method has been shown to be extremely effective at identifying significant events, with one experiement finding 63% more issues in the biometic-supplimented trials compared to observation alone. (Pejman et al. 2011)

In either case, physiological data is valuable for user testing in that it can serve not only as an additional source of information about users’ psychological states, but also because these same records can be used to guide and modify traditional methods so as to produce data that is both more reliable, and more comprehensive, than traditional methods alone.

Integrating Physiological Monitoring Methods

So now that we know why you might want to include physiological data in your user-testing experiments, let’s get into the how.

The first step in this process is going to be determining what kind of information you want to collect. Different monitoring devices have different strengths and weaknesses. Measuring how tightly someone grips a controller, for example, can be a good way to track how aroused someone is, but it doesn’t tell you much about valence. The physical size and shape of the equipment you will be using is also going to vary, as will their potential effects on gameplay and user experience.

Some Physiological Monitoring Methods

The items on this list are techniques which have been used in the user-testing experiments with which I am familiar and which have already been shown to produce reliable/valid data. It is by no means a completely comprehensive list of all possible user-testing methods.

Electroencephalography (EEG)

This topic actually got its own post over on my other blog, but to review: Electroencephalography is the recording of electrical signals along the scalp, you can track them with a sensor net, an EEG cap, and now that there’s a market for them, headsets and toys like the Nekomimi (below). They’ve been shown to be pretty useful at identifying critical in-game events, but they can be rather invasive, expensive, and time-consuming. Also, the data collected from them can be fairly difficult to interpret.

Useful for: Monitoring attention/boredom, could be useful for identifying common patterns and behaviors, identifying in-game events which may trigger significant changes in focus

Eye Tracking and Electrooculography (EOG)

Eye tracking is a good way to get an idea of where your users are looking during play, as well as how fast their eyes are moving. By far the most common method for doing this is by using camera-based eye tracking systems.

The second method, Electrooculography, measures the resting potential of the retina, which changes based on eye orientation. EOG measurements are already used in motion-capture to faithfully track the positions of actors’ eyes, and has the added advantage of being fairly non-invasive, as the electrodes used do not interfere with the subjects’ field of vision.

On the other hand, the lack of any standardized electrode configuration means it’s difficult to compare your results with those of other researchers, the signal itself can be cluttered with blinking-artifacts (which are exactly what they sound like), and the whole setup requires a much higher sampling rate than other methods.

Useful for: Determining where people were looking, how long they were looking at whatever was there, identifying distractions or areas of interest and user’s ability to identify important in-game elements visually. Unfortunately gaze is not a functional proxy for “attention” nor would it give you much information (in 3D environments) about how far “out” someone was looking.

(Image source: Krupi ?nski et al. 2012)

Electromyography (EMG)

EMG records the electrical activity produced by skeletal muscles. Facial EMG in particular can be useful due to the fact that it allows one to track the muscles involved with making facial expressions like smiling or frowning, and as such, can give you a good idea the nature (positive or negative) of subjects’ emotional states during play.

Unfortunately, even if your test subjects were totally comfortable with a bunch on sensors on their faces, you’d still have to sacrifice their ability to speak. You’re going to get enough hassle from normal recording artifacts, so you don’t want to create any more from people talking.

Useful for: Using as a proxy for measuring valence, as it enables you to capture the activity of the muscles involved in making facial expressions.

Galvanic Skin Response (GSR)

Galvanic Skin Response/Skin conductance is a measure of the electrical conductance of your skin, and varies based on how moist your skin is at any given point in time. GSR can be used as an indicator of arousal because cause your sweat glands are controlled by the sympathetic nervous system.

Possible issues with this method come from several places. The temperature and humidity in which you are operating can have a significant effect on readings, and make the task of comparing readings from different sessions rather difficult. Internals factors, both biological and psychological, can also lead to depressed readings, or a complete lack of significant variation, depending on the subject.

Useful for: Monitoring arousal/stress, identifying in-game situations (not one-off events) which may increase stress or arousal over time.

Cardiac Responses

There are actually a number of different cardiac responses one can track for the purposes of measuring arousal, including: interbeat intervals, heart rate, heart rate variability, and blood pressure. The obtrusiveness of these methods depends on what’s being tracked, and can range anywhere from arm cuffs to video analysis. Metrics like heart rate variability and systolic blood pressure, for example, have been shown to be fairly reliable indicators of “invested effort” and can be used to detect immediate changes in mental workload. The limitations of these methods are dependent upon the technique being used, and as with the other metrics, can show significant differences between individuals.

Useful for: Identifying immediate responses to game events, monitoring arousal, and may be used as a proxy for subjects’ mental workload.

Respiration

While respiration is indeed sensitive to changes in mental workload and emotional states, it’s also one of the few physiological responses used in user testing that almost everyone can control, consciously, without much effort or training. As such, while it can be easy to measure, it may not be as useful as the other methods outlined above if you’re looking for raw physical responses. Also, as with facial EMGs, your subjects wouldn’t be able to talk while you’re recording their breathing.

Useful for: Testing games which may contain an actual physical element to gameplay in addition to, or in lieu of, traditional input.

Designing the Experiment

Once you know what you will be collecting, keep the following questions in mind when designing your experiment:

1. Are we making sure it works, looking for general input and information, or are we doing an actual game-design experiment?

User testing is not quality assurance. Quality assurance is, and should always be conducted as a separate activity. It has a specific goal, and the people who do it need to be intimately familiar with the game they’re testing, they literally cannot give us the kind of information we’re looking for in the other two situations.

If we are looking for feedback, then you want to collect as much information as possible, about as many game elements as you can. Your goal will be to identify anything which might be interesting, troublesome, or worth expanding upon. Ask questions not only about gameplay (“Were the goals clear?”) but emotional experiences as well.

If, however, you would like to test something specific, something that you can use as an actual test variable, then you will be performing an actual experiment, and you should act accordingly. Strive for consistency, repeatability, and never change more than one variable at a time.

2. How is the physiological data we collect going to be used?

Next, make a decision about when you are going to use this information. You can collect data for post-test analysis, record data to be played back immediately as a means of guiding post-gameplay interviews, or do both.  (…or some shiny new method that you just came up with, in which case I want to hear about it.)

3. How will we attempt to control for the presence of the monitoring equipment? (If at all?)

Certain equipment can distract test subjects and affect their performance during gameplay, while others may have little to no effect whatsoever. Depending on the method being used, you may not feel the need to control for the presence of your monitoring devices, and that’s okay.

If you do want to control for such things, however, you can do so by creating three experimental groups: one with no physiological data collection; one with no traditional data collection; and one with both. Alternatively, if you have a whole lot of time and resources at your disposal, you could just try to integrate monitoring devices into the normal input/gameplay hardware itself.

It would totally work, too. I want to help develop and test them. Also, my eye is already messed up, so that’s one less thing to worry about.

Conclusion

The integration of physiological data into user testing and research can have significant impacts on both the quality and quantity of the data collected. If you are creating a good or service of any sort, it is of the utmost importance that you perform both quality assurance and user-testing as early and as often as possible. Make sure you understand what you are looking for, as well as the pros and cons of the methods you can use to gather that information so that you can frame your questions properly, and design your experiments appropriately.(source:gamasutra)


上一篇:

下一篇: