人们无法准确评估机率这点令游戏设计师处在有利位置。简单游戏机制，例如《Settlers of Catan》中基于骰子的资源生成系统，令人难以把握。
此外，运气会不必要地放缓游戏进程。棋盘游戏《History of the World》和《Small World》就有非常类似的征服机制（游戏邦注：除前者使用骰子，后者未采用该元素）。每次攻击都旋转骰子促使《History of the World》的持续时间比《Small World》长3-4倍。原因不单是转动如此多骰子所存在的逻辑问题——获悉决策结果具有可预测性促使玩家提前策划所有步骤，而无需担心意外事件。通常来说，应对意外事件是游戏设计的核心内容，但游戏速度也是个重要元素，所以设计师需确保取舍具有价值。
最近融入随机元素的典型例子是《洞穴探险》，在这款游戏中，独立开发者Derek Yu既融入《NetHack》中的随机生成关卡，也借鉴《Lode Runner》当中的2D装置。游戏沉浸性来自于其蕴含的无数有待探索的新关卡，但游戏融入某些意外怪兽和隧道，其难度颇令人沮丧。
有些纸牌游戏，例如《Magic: The Gathering》或者《Dominion》，通过将游戏体验集中于是否能够在玩家构建平面中绘制纸牌，把可能性元素放在显著位置。把握稀有和普遍元素比例，知晓纸牌每次只能通过平面绘制一次的玩家在能够在这些游戏中取得胜利。这个理念可以通过提供虚拟“平面骰子”（游戏邦注：以确保骰子旋转能够保持平衡）延伸到其他机会游戏中。
另一源自古老游戏历史的有趣理念是回合策略游戏《Lords of Conquest》中“机会元素”游戏选项。3个选项（低、中和高）决定运气是仅用于打破僵局，还是决定战斗方面扮演重要角色。决定机会在游戏中的最合理角色非常主观，给予玩家调整旋钮的权利能够促使游戏吸引更广泛口味不同的用户。
问题1. 假设你正在设计一款全新MMORPG游戏，你设定当玩家消灭一只怪兽时，特殊道具Orc Nostril Hair将有10%的出现几率。某位测试者回馈称，他消灭20只怪兽，发现Orc Nostril Hair 4次，而另一位测试者则表示，自己消灭20只怪兽，没有发现Orc Nostril Hair。这里是否存在编程漏洞？
先说重点，你需要先摇到第一个“6”（1/6的几率），然后你得摇到另一个“6”（1/6的几率）。若一个事件的发生取决于另一事件的成败，那么你需要将二者的概率相乘，方能得到最终发生概率。在此，就是1/6 x 1/6 = 1/36，这就是你连续两次摇到“6”的概率。
通过这一新发现的条件概率，我们很容易进行疯狂骰子投掷的几率运算。你连续摇到4个“6”的几率是多少？答案是1/6 x 1/6 x 1/6 x 1/6。或者更简单的，(1/6)4 = .0008 = .08%。那么连续摇到10个“2”呢？(1/6)10 =相当小的百分比。
逐步提高难度，在摇到“5”或“5”以上数字后，摇到“3”或“3”以上数字的几率是多少？就是4/6 x 2/6 = 8/36 = 2/9 = 22.2%。
错误1：通过6面骰摇到“5”的概率是1/6 = 16.7%。这从来没有变过。这和你是否连续摇到8次“5”或很久都没摇到“5”毫无关系。16.7%依然是个幻数。“骰子没有记忆”是个惯用语，这完全正确。
假设你即将投掷一个6面骰。你投到“6”的概率有多大？虽然我们已经知道答案，这里我们将运用反向概率进行论证。你没有摇到“6”的概率是5/6 ，因此你摇到“6”的概率是1–5/6 = 1/6，或是16.7%。换而言之，你没有摇到“6”的概率是5/6，那么你摇到“6” 的概率是1/6。这毫无疑义。
唯一问题是，我们如何算出在两次抽牌中抽到红桃的总概率？我们很容易就会犯下这一错误，认为是9/47 + 9/46。但这并不正确。这和下述错误类似：认为6次摇到“6”的总概率是1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1.0 = 100%。遗憾的是，我们无法在摇掷6次骰子后100%摇到“6”。
事实证明，通过反向概率解决这两个问题要简单得多。我们会这样设问：“没有抽到红桃的概率是多少？”就第一次而言，概率是(47 – 9)/47 = 38/47。第二次的概率是(46 – 9)/46 = 37/46。根据条件事件方面的知识，我们很容易就能够算出这两个事件的发生概率。换而言之，我们需要算出两次都没抽到红桃的概率，即38/47 x 37/46 = 65.0%。我们对凑成同花顺的概率非常感兴趣，所以我们将1.0扣去此数值，得到1.0 –0 .65 = 0.35 = 35%。所以凑成同花顺的概率是35%。
注意：摇骰子问题的计算方式也类似。6次中至少摇到1次“6” 的概率是通过计算没有摇到“6” 的概率得来。在每次摇掷中，没有摇到“6”的概率是5/6。因此，6次完全没有摇到“6”的概率就是：5/6 x 5/6 x 5/6 x 5/6 x 5/6 x 5/6 = 0.33 = 33%。所以，至少摇到1次“6”的概率是1.0 –0 .33 =0 .67 = 67%。因此你将有2/3的几率摇到至少一次“6” 。
问题1. Orc Nostril Hair的出现概率
在此情况中，两位测试者的结果都处于概率范围内。若每次消灭怪兽，发现Orc Nostril Hair（ONH）的基本概率是10% ，那么消灭20只怪兽后发现至少4次ONH的概率是13.3%。你也许会问，我如何得到这一数据？在此，我运用了一个高级概念，叫做二项分配，但这并不是本文的谈论内容。
在20次机会中没有发现道具的概率是(.90)^20 = 12.2%。
2x或更高破坏性的概率=0.75 x 0.75 = 56.3%
4x或更高破坏性的概率=(0.75)^4 = 31.6%
由于种种原因，只有《The Big Guy》可以解释生活中的许多事情倾向于同一模式发展或者分布。
你的样本数量越大，最终出现的误差值便会越小。Mo data is bettuh（越多数据越好）。
人们始终都在误解统计报表。比起使用对照关系，我们总是更容易推断出一些并不存在的深层次的关系。我最喜欢的一个例子便是著名的飞行面条怪物信仰（游戏邦注：是讽刺性的虚构宗教）的《Open Letter to the Kansas School Board》中的“海盗vs.全球变暖”图表：
篇目1，Paul Williams is undertaking a PhD in Cognitive Psychology at the University of Newcastle, under the supervision of Dr. Ami Eidels. He is interested in developing online gaming platforms suitable for the investigation of cognitive phenomena, and is currently focused on refining and implementing a novel paradigm to study the behavioral phenomenon known as the “hot hand.”
Balancing Risk and Reward to Develop an Optimal Hot-Hand Game
This paper explores the issue of player risk-taking and reward structures in a game designed to investigate the psychological phenomenon known as the ‘hot hand’. The expression ‘hot hand’ originates from the sport of basketball, and the common belief that players who are on a scoring streak are in some way more likely to score on their next shot than their long-term record would suggest. There is a widely held belief that players in many sports demonstrate such streaks in performance; however, a large body of evidence discredits this belief. One explanation for this disparity between beliefs and available data is that players on a successful run are willing to take greater risks due to their growing confidence. We are interested in investigating this possibility by developing a top-down shooter. Such a game has unique requirements, including a well-balanced risk and reward structure that provides equal rewards to players regardless of the tactics they adopt. We describe the iterative development of this top-down shooter, including quantitative analysis of how players adapt their risk taking under varying reward structures. We further discuss the implications of our findings in terms of general principles for game design.
Key Words: risk, reward, hot hand, game design, cognitive, psychology
Balancing risk and reward is an important consideration in the design of computer games. A good risk and reward structure can provide a lot of additional entertainment value. It has even been likened to the thrill of gambling (Adams, 2010, p. 23). Of course, if players gamble on a strategy, they assume some odds, some amount of risk, as they do when betting. On winning a bet, a person reasonably expects to receive a reward. As in betting, it is reasonable to expect that greater risks will be compensated by greater rewards. Adams not only states that “A risk must always be accompanied by a reward” (2010, p. 23) but also believes that this is a fundamental rule for designing computer games.
Indeed, many game design books discuss the importance of balancing risk and reward in a game:
* “The reward should match the risk” (Thompson, 2007, p.109).
* “… create dilemmas that are more complex, where the players must weigh the potential outcomes of each move in terms of risks and rewards” (Fullerton, Swain, & Hoffman, 2004, p.275).
* “Giving a player the choice to play it safe for a low reward, or to take a risk for a big reward is a great way to make your game interesting and exciting” (Schell, 2008, p.181).
Risk and reward matter in many other domains, such as stock-market trading and sport. In the stock market, risks and rewards affect choices among investment options. Some investors may favour a risky investment in, say, nano-technology stocks, since the high risk is potentially accompanied by high rewards. Others may be more conservative and invest in solid federal bonds which fluctuate less, and therefore offer less reward, but also offer less risk. In sports, basketball players sometimes take more difficult and hence riskier shots from long distance, because these shots are worth three points rather than two.
Psychologists, cognitive scientists, economists and others are interested in the factors that affect human choices among options varying in their risk-reward structure. However, stock markets and sport arenas are ‘noisy’ environments, making it difficult (for both players and researchers) to isolate the risks and rewards of any given event. Computer games provide an excellent platform for studying, in a well-controlled environment, the effects of risk and reward on players’ behaviour.
We examine risk and reward from both cognitive science and game design perspectives. We believe these two perspectives are complementary. Psychological principles can help inform game design, while appropriately designed games can provide a useful tool for studying psychological phenomena.
Specifically, in the current paper we discuss the iterative, player-centric development (Sotamma, 2007) of a top-down shooter that can be used to investigate the psychological phenomenon known as the ‘hot hand’. Although the focus of this paper is on the process of designing risk-reward structures to suit the design requirements of a hot-hand game, we begin with an overview of this phenomenon and the current state of research. In subsequent sections we describe three stages of game design and development. In our final section we relate our findings back to more general principles of game design.
The Hot Hand
The expression ‘hot hand’ originates from basketball and describes the common belief that players who are on a streak of scoring are more likely to score on their next shot. That is, they are on a hot streak or have the ‘hot hand’. In a survey of 100 basketball fans, 91% believed that players had a better chance of making a shot after hitting their previous two or three shots than after missing their previous few shots (Gilovitch, Vallone, & Tversky, 1985).
While intuitively these beliefs and predictions seem reasonable, seminal research found no evidence for the hot hand in the field-goal shooting data of the 1980-81 Philadelphia 76ers, or the free-throw shooting data of the 1980-81 and 1981-82 Boston Celtics (Gilovitch et al., 1985). With few exceptions, subsequent studies across a range of sports confirm this surprising finding (Bar-Eli, Avugos, & Raab, 2006) – suggesting that hot and cold streaks of performance could be a myth.
However, results of previous hot hand investigations reveal a more complicated picture. Specifically, previous studies suggest that a distinction can be made between tasks of ‘fixed’ difficulty and tasks of ‘variable’ difficulty. A good example of a ‘fixed’ difficulty task is free-throw shooting in basketball. In this type of shooting the distance is kept constant, so each shot has the same difficulty level. In a ‘variable’ difficulty task, such as field shooting during the course of a basketball game, players may adjust their level of risk from shot-to-shot, so the difficulty of the shot varies depending on shooting distance, the amount of defensive pressure, and the overall game situation.
Evidence suggests it is possible for players to get on hot streaks in fixed difficulty tasks such as horseshoe pitching (Smith, 2003), billiards (Adams, 1996), and ten-pin bowling (Dorsey-Palmenter & Smith, 2004). In variable difficulty tasks, however, such as baseball (Albright, 1993), basketball (Gilovitch et al., 1985), and golf (Clark, 2003a, 2003b, 2005), there is no evidence for hot or cold streaks – despite the common belief to the contrary.
The most common explanation for the disparity between popular belief (hot hand exists) and actual data (lack of support for hot hand) is that humans tend to misinterpret patterns in small runs of numbers (Gilovitch et al., 1985). That is, we tend to form patterns based on a cluster of a few events, such as a player scoring three shoots in a row. We then use these patterns to help predict the outcome of the next event, even though there is insufficient information to make this prediction (Tversky & Kahneman, 1974). In relation to basketball shooting, after a run of three successful shots, people would incorrectly believe that the next shot is more likely to be successful than the player’s long term average. This is known as the hot-hand fallacy.
A different explanation for this disparity suggests shooters tend to take greater risks during a run of success, for no loss of accuracy (Smith, 2003). Under this scenario, a player does show an increase in performance during a hot streak – as they are performing a more difficult task at the same level of accuracy. This increase in performance may in turn be reflected in hot hand predictions, however would not be detected by traditional measures of performance. While this hypothetical account receives tentative support by drawing a distinction between fixed and variable difficulty tasks (as the hot hand is more likely to appear in fixed-difficulty tasks, where players cannot engage in a more difficult shot), this hypothesis requires further study.
Unfortunately, trying to gather more data to investigate the hot hand phenomenon from sporting games and contests is fraught with problems of subjectivity. How can one assess the difficulty of a given shot over another in basketball? How can one tell if a player is adopting an approach with more risk?
An excellent way to overcome this problem is to design a computer game of ‘variable’ difficulty tasks that can accurately record changes in player strategies. Such a game can potentially answer a key question relevant to both psychology and game design – how do people (players) respond to a run of success or failure (in a game challenge)?
The development of this game, which we call a ‘hot hand game’, is the focus of this paper. Such a game requires a finely tuned risk and reward structure, and the process of tuning this structure provides a unique empirical insight into players risk taking behaviour. At each stage of development we test the game to measure how players respond to the risk and reward structure. We then analyse these results in terms of player strategy and performance and use this analysis to inform our next stage of design.
This type of design could be characterised as iterative and player-centric (Sotamaa, 2007). While the game design in this instance is simple, due to the precise requirements of the psychological investigation, player testing is more formal than might traditionally be used in game development. Consequently, changes in player strategy can be precisely evaluated. We find that even subtle changes to risk and reward structures impact on player’s risk-taking strategy.
Game Requirements and Basic Design
A hot hand game that addresses how players respond to a run of success or failure has special requirements. First and foremost, the game requires a finely-tuned risk and reward structure. The game must have several (5-7), well-balanced risk levels, so that players are both able and willing to adjust their level of risk in response to success and failure. If, for example, one risk level provides substantially more reward than any other, players will learn this reward structure over time, and be unlikely to change strategy throughout play. We would thus like each risk level to be, for the average player, equally rewarding. In other words, regardless of the level of risk adopted, the player should have about the same chance of obtaining the best score.
The second requirement for an optimal hot hand game is that it allows measurement of players’ strategy after runs of both successes and failures. If people fail most of the time, we will not record enough runs of success. If people succeed most of the time, we will not observe enough runs of failure. Thus, the core challenge needs to provide a probability of success, on average, somewhere in the range of 40-60%.
The game developed to fulfil these requirements was a top-down shooter developed in Flash using Actionscript. While any simple action game based on a physical challenge with hit-miss scoring could be suitably modified for our purposes, a top-down shooter holds several advantages. Firstly, high familiarity with the style means the learning period for players is minimal, supporting our aims of using the game for experimental data collection. Secondly, the simple coding of key difficulty parameters (i.e. target speeds and accelerations) allows the reward structure to be easily and precisely manipulated. Lastly, a ‘shot’ of a top-down shooter is analogous to a ‘shot’ in basketball, with similar outcomes of ‘hit’ and ‘miss’. This forms a clear and identifiable connection between the current experiment and the origins of the hot hand.
In the top-down shooter, the goal of the player is to shoot down as many alien spaceships as possible within some fixed amount of time. This means the number of overall shots made, as well as the number of hits, depend on player performance and strategy. The game screen shows two spaceships, representing an alien and the player-shooter (Figure 1). The simple interface provides feedback about the current number of kills and the time remaining. During the game the player’s spaceship remains stationary at the bottom centre of the screen. Only a single alien spaceship appears at any one time. It moves horizontally back-and-forth across the top of the screen, and bounces back each time it hits the right or left edges. The player shoots at the alien ship by pressing the spacebar. For each new alien ship the player has only a single shot with which to destroy it. If an alien is destroyed the player is rewarded with a kill.
Figure 1: The playing screen.
Each alien craft enters from the top of the screen and randomly moves towards either the left or right edge. It bounces off each side of the screen, moving horizontally and making a total of eight passes before flying off. Initially the alien ship moves swiftly, but it decelerates at a constant rate, moving more slowly after each pass. This game therefore represents a variable difficulty task; a player can elect a desired level of risk as the shooting task becomes less difficult with each pass of the alien.
The risk and reward equation is quite simple for the player. The score for destroying an alien is the same regardless of when the player fires. Since the goal is to destroy as many aliens as possible in the game period, the player would benefit from shooting as quickly as possible; shooting in the early passes rewards the player with both a kill and more time to shoot at subsequent aliens. However, because the alien ship decelerates during each of the eight passes, the earlier a player shoots the less likely this player will hit the target. If a shot is missed, the player incurs a 1.5 second time penalty. That is, the next alien will appear only after a 1.5 second delay which is additional to the interval experienced for an accurate shot.
Stage One–Player Fixation
After self-testing the game, we deployed it so that it could be played online. Five players were recruited via an email circulated to students, family and friends. Players were instructed to shoot down as many aliens as possible within a given time block. They first played a practice level for six minutes before playing the competitive level for 12 minutes. The number of alien ships a player encountered varied depending on the player’s strategy and accuracy. A player could expect to encounter roughly 10 alien ships for every 60 seconds of play. At the completion of the game the player’s response time and accuracy were recorded for each alien ship.
Recall that one of the game requirements was that players take shots across a range of difficulty levels, represented by passes (later passes mean less difficult shots)–this simple test provides evidence that a player is willing to explore the search space and alter her or his risk-taking behaviour throughout the game. Typical results for Players one and two are shown in Figure 2. In general players tended to be very exploratory during the practice level of the game, as indicated by a good spread of shots between alien passes one and eight. During the competitive game time however players tended to invest in a single strategy, as indicated by the large spikes seen in the competition levels of Figure 2. This suggests that players, after an exploratory period, attempted to maximise their score by firing on a single, fixed pass.
Figure 2: Results for two typical players in Stage one of game development. The upper row shows data for Player 1, and the bottom row shows data for Player 2. The left column presents the frequency (%) of shots taken on each pass in the practice level, while the right column indicates the frequency (%) of shots taken on each pass in the competition level. Note that players experimented during the practice level, as evidenced by evenly spread frequencies across passes in the left panels, but then adopted a fixed strategy during the competitive block, as evidenced by spikes at pass 4 (Player 1) and pass 5 (Player 2). For each panel, n is the overall number of shots attempted by the player in that block, m is the mean firing pass, and sd is the standard deviation of the number of attempted shots.
In experimental terms, this fixation on a single strategy is known as ‘investment’. At the end of the game the players reported that, because of the constant level of deceleration, they could always shoot when the alien was at a specific distance from the wall if they stuck to the same pass. Players thus practiced a timing strategy specific to a particular alien pass (i.e., a specific difficulty level). The number of kills per unit time (i.e., the reward) was therefore always highest for that player when shooting at the same pass. In the example graphs (Figure 2), one player ‘invested’ in learning to shoot on pass four, the other, on pass five. This type of investment runs counter to one requirement of a hot-hand game, creating a major design flaw that needed to be fixed in the next iteration.
Stage Two–Encouraging Exploratory Play
The aim of the second stage of design was to overcome the problem of player investment in a single strategy. The proposed solution was to vary the position of the player’s ship so that it no longer appeared in the same location at the centre of the screen but rather was randomly shifted left and right of centre each time a new alien appeared (Figure 3). Thus, on each trial, the shooter’s location was sampled from a uniform distribution of 100 pixels to the left or to the right of the centre. This manipulation was intended to prevent the player from learning a single timing-sequence that was always successful on a single pass (such as always shooting on pass four when the alien was a certain distance from the side of the screen).
Figure 3: The screen in Stage two of game development. The blue rectangle appears here for illustration purposes and indicates the potential range of locations used to randomly position the player’s ship. It did not appear on the actual game screen.
Once again we deployed an online version of the game and recorded data from six players. Players once again played a practice level for six minutes before they played the competitive level for 12 minutes.
The results for all individual players in the competitive game level are shown in Figure 4. Introducing random variation into the players firing position significantly decreased players’ tendency to invest in and fixate on a single pass. This decrease in investment is highlighted by the increase in the variance seen in Figure 4 when compared to Figure 2. Thus, the slight change in gameplay had a significant effect on players’ behaviour, encouraging them to alter their risk-taking strategy throughout the game. Furthermore, this change helps to meet the requirements necessary for hot hand investigation.
Figure 4: Individual player results for the competition level in Stage two testing. Player’s tendency to fire on a single pass in the competition level has been significantly reduced compared to Stage One, as evidenced by the reduction in spikes and, in most cases, increase in variance. For each panel, n is the overall number of shots attempted by the player in that block, m is the mean firing pass, and sd is the standard deviation of the number of attempted shots.
In Figure 5 we present data averaged across all players for both the practice and competition levels. This summary highlights how the game’s reward structure influenced player strategy throughout play. The left column corresponds to the practice level (not shown in Figure 4), while the right column corresponds to the competition level.
Figure 5: Average player results for Stage two. The left column presents the frequency (%) of shots taken on each pass in the practice level, while the right column indicates the frequency (%) of shots taken on each pass in the competition level. For each panel, m is the mean firing pass and n is the overall number of shots attempted by all players in that block. A comparison of mean firing pass for practice and competition levels highlights that as the game progressed, players fired later.
An inspection of Figure 5 highlights the fact that players’ shooting strategy altered in a predictable manner as the game progressed. For example, the mean firing pass for the practice level (m = 5.8) was smaller than that seen in the competition level (m = 6.21). Thus players tended to shoot later in the competition level. This suggests that the reward structure of the game was biased towards firing at later passes, and that as players became familiar with this reward structure they altered their gameplay accordingly.
Given the need to minimise such bias for hot hand investigation, we examined the risk and reward structure on the basis of average player performance. We were particularly interested in the probability of success for each pass, and how this probability translated into our reward system. Recall that firing on later passes takes more time but is also accompanied by a higher likelihood for success. As the aim of the hot hand game is to kill as many aliens as possible within a 12 minute period, both the probability of hits as well as the time taken to achieve these hits are important when considering the reward structure.
We therefore analysed how many kills per 12-minute block the average, hypothetical player would make if he or she were to consistently fire on a specific pass for each and every alien that appeared. For example, given the observed likelihood of success on pass one, how many kills would a player make by shooting only on pass one? How many kills on pass two, and so on. Results of this examination are reported in Figure 6. Figure 6A shows the average number of shots taken by players on each pass of the alien (overall height of bar) along with the average number of hits at each pass (height of yellow part of the bar). Figure 6B uses this data to plot the observed probability of success and shows that the probability for success is higher for later passes. This empirically validates that later passes are in fact ‘easier’ in a psychological sense.
Figure 6: Averaged results and some modelling predictions from Stage two of game development. In Panel A, the frequency (%) of shots attempted on each pass is indicated by the overall height of each bar. The proportion of hits and misses are indicated in yellow and blue. Panel B depicts the average probability of a hit for each pass, given by the number of hits out of overall shot attempts. Based on the empirical results, Panels C and D show the predicted number of successful shots if players were to consistently shoot on only one pass for the entire game (see text for details).
These probabilities allow empirical estimation of the number of total kills likely to be attained by the hypothetical average player if they were to shoot on only one pass for an entire 12 minute block. By plotting the number of total kills expected for each pass number, we produce an optimal strategy curve for the current game, as shown in Figure 6C. The curve is monotonically increasing, indicating that the total number of kills expected of an average player increases as the pass number increases. In other words, players taking less difficult shots are expected to make more hits within each game. The reward structure is clearly biased toward later passes, which validates the change in player strategy (i.e. firing on later passes) as the game progressed. As the players became accustomed to the reward structure, their strategy shifted accordingly to favour later, easier shots.
In game terms it might be considered an exploit to shoot on pass eight. Figure 6C indicates that consistently firing on pass 8 would clearly result in the greatest number of kills, making it the ‘optimal’ strategy for the average player. Given that an exploit of this kind reduces the likelihood of players to fire earlier in response to a run of successful shots, the current design still failed to meet the requirements for our hot hand game.
One simple adjustment to overcome this issue was to reduce the penalty period after an unsuccessful shot. While the current time penalty for a missed shot was set to 1.5 seconds, the ability to vary this penalty allows a deal of flexibility within the reward structure. Given that players make many more shots, and thus many more misses, if they choose to fire on early passes – decreasing the time penalty for a miss substantially increases the relative reward for firing on early passes.
In line with this thinking, Figure 6D shows the predicted number of kills in 12 minutes for the average player if the penalty for missing is reduced from 1.5 seconds to 0.25 seconds. This seemingly small change balances the reward structure so that players are more evenly rewarded, at least for passes three to eight. Estimation of accuracy rate on passes one and two were based on a small number of trials, which makes them problematic for modelling; participants avoided taking early shots, perhaps because the alien was moving too fast for them to intercept. Allowing for players to fire on passes three to eight still provided us with sufficient number of possible strategies for a hot hand investigation.
Stage Three–Balancing Risk and Reward
In stage two of our design we uncovered an exploitation strategy in the risk and reward structure of the game where players could perform optimally by shooting on pass eight of the alien. We suspect this influenced players to fire at later passes of the alien, particularly as the game progressed. Using empirical data to model player performance suggested that reducing the time penalty for a miss to 0.25 seconds would overcome this problem.
A modified version of the game, with a 0.25 seconds penalty after a miss, was made available online and data were recorded from five players. Averaged results show that players shot at roughly the same mean pass of the alien in the practice level and the competitive level (Figure 7). This pattern is in contrast with Figure 4, which highlighted a tendency for players to fire at later passes in the 12 minute competitive level. This data confirms the empirical choice of a 0.25 second penalty, and provides yet another striking example of how subtle changes in reward structure may influence players’ behaviour.
Figure 7: Average player results for Stage three of game development. The left plot presents the frequency (%) of shots attempted on each pass in the practice level, while the right plot indicates the frequency (%) of shots attempted on each pass in the competition level. For each panel, m is the mean firing pass and n is the overall number of shots taken by all players in that block. As indicated by the mean firing pass, under a balanced reward structure players no longer attempted to shoot on later passes as the game progressed.
Recall that we began the development of a hot-hand game with the requirement that for each level of assumed risk the game should be equally rewarding (total number of kills) for the average player. By balancing the reward structure, the design from stage three is now consistent with this requirement for investigating the hot hand.
Finally, we required the game to have an overall level of difficulty such that players would succeed on about 40-60 percent of attempts. Performance within this range would allow us to compare player strategy in response to runs of both success and failure. That is, testing for both hot and cold streaks. As highlighted by Figure 8, the overall probability of success does indeed meet this criteria; the overall probability of success (hits) was 43%. Thus, the game now meets the essential criteria required to investigate the hot hand phenomenon.
Figure 8: Averaged results from the competition level of Stage three of game development. In Panel A, the frequency (%) of shots attempted on each pass is indicated by the overall height of each bar. The proportion of hits and misses are indicated in yellow and blue. Panel B depicts the average probability of success for each pass, given by the number of hits out of overall shot attempts. In Panel B, ps is the overall probability of success (hits).
We set out to design a computer game as a tool for studying a fascinating and widely studied psychological phenomenon called the ‘hot hand’ (e.g., Gilovitch, Valone, & Tversky, 1985). For this we needed a game that allowed us to investigate player risk-taking in response to a string of successful or unsuccessful challenges.
We designed a simple top-down shooter game where players had a single shot at an alien spacecraft as it made eight passes across the screen. During the game the player faced this same challenge a number of times. The goal of the game was to kill as many aliens as possible in a set amount of time. The risk in the gameplay reduced on each pass as the alien ship slowed down. Shooting successfully on earlier passes rewarded the player with a kill and made a new alien appear immediately. Missing a shot penalised the player with an additional wait time before the next alien appeared.
As a hot hand game it was required to meet specific risk and reward criteria. Players should explore a range of risk-taking strategies in the game and they should be rewarded in a balanced way commensurate with this risk. We also wanted the game challenge to have an average success rate roughly equal to the failure rate, between 40 and 60 percent so that we could use the game to gather data about player’s behaviour in response to both success and failure.
To achieve our objective we developed the game in an iterative fashion over three stages. At each stage we tested an online version of the game, gathering empirical data and analysing the players’ strategy and performance. In each successive stage of design we then altered the game mechanics so they were balanced in a way that met our specific hot hand requirements. The design changes and their effects are summarised in Table 1.
Table 1. A summary of changes to design in each of the stages and the effect of these changes on meeting the hot hand requirements.
Books on game design tend to prescribe an iterative design process. Iterative processes allow unforseen problems to be addressed in successive stages of design. This is especially important in games where the requirements for the game mechanics are typically only partially known and tend to emerge as the game is built and played. Salen and Zimmerman describe this iterative process as “play-based” design and also emphasise the importance of “playtesting and prototyping” (2004, p. 4). For this purpose successive prototypes of the game are required. Indeed we began with only high-level requirements and used this same iterative, prototyping approach to refine our gameplay.
The main difference in our approach is that we more formally measured player’s strategies and exploration behaviours in each stage of design. Given that our game requirements are rather unique, it is unlikely that subjective feedback alone would have allowed us to make the required subtle changes to game mechanics. For example, during the initial testing of the game we found that players tended to invest in a single playing strategy. Further analysis also revealed a potential exploit in the game as players could easily optimise their total number of kills by shooting on the last pass of each alien ship.
The issue of exploits in games is often debated in gaming circles and is also well studied in psychology. Indeed trade-offs between exploitation and exploration exist in many domains (e.g., Hills, Todd, & Goldstone, 2008; Walsh, 1996). External and internal conditions determine which strategy the organism, or the player, will take in order to maximise gains and minimise loses. For example, when foraging for food, the distribution of resources matters. Clumped resources lead to a focused search in the nearby vicinity where they are abundant (exploitation), whereas diffused resources lead to broader exploration of the search space.
Hills et al. showed that exploration and exploitation strategies compete in mental spaces as well, depending on the reward for desired information and the toll incurred by search time for exploration. In the context of our game, a shooting strategy of consistently attempting the easiest shooting level produced the highest reward. This encouraged players to drift toward later firing as the game progressed, and in turn inhibited players from exploring alternate (earlier firing) strategies. It is unlikely we could have predicted this without collecting empirical data from players.
A further advantage of gathering empirical data was that it allowed us to remodel our reward structure based on precise measures of player performance. In stages one and two players lost 1.5 seconds each time they missed an alien. In stage three we reduced this penalty to 0.25 seconds based on our analysis and modelling of player behaviour. This relatively minor change was enough to change players’ behaviour and encourage them to risk earlier shots at the alien. The fact that our game is quite simple in nature reinforces both the difficulty and importance of designing a well-balanced risk and reward structure.
Another common principle referred to in game literature is player-centred design which is defined by Adams as “a philosophy of design in which the designer envisions a representative player of a game the designer wants to create.” (2010, p. 30). Although player-centred design is often a common principle referred to in game-design texts there is some suggestion that design is often based purely on designer experience (Sotamaa, 2007). Involving players in the design process typically involve more subjective feedback from approaches such as focus groups and interviews which have been generally used in usability design. In our study, when designing even a simple game challenge it is clear that the use of empirical data to measure how players approach the game and how they perform can be another vital element in balancing the gameplay.
We also recognise some dangers with this approach, as averaging player performance can hide important differences between players. It would be nice to have a model of an ideal player but it is unlikely such a player exists. In fact there are many different opinions about who the ‘player’ is (Sotamaa, 2007). The empirical data therefore need to be gathered from the available players’ population. If there are broad differences among these players then it may require the designer to sample different groups, for example, a group of casual players and a group of hard-core gamers.
Importantly for future research, the game design at which we arrived is now suitable to investigate the hot hand phenomena. Such a game can potentially answer a number of questions:
1. How do players respond to a run of success or failure in a game challenge?
2. Will a player take on more difficult challenges if they are on a hot streak?
3. Will they lower their risk if they are on a cold streak?
4. How will this variable risk level impact on their overall measure of performance?
5. How can the hot hand principle be used in the design of game mechanics?
Answers to such questions will not only be of interest to psychologists, but could also further inform game design. For example, it might allow the designer to engineer a hot streak so that players would take more risks or be more explorative in their strategies. Of course in a game it might even be appropriate to use a cold streak to discourage a player’s current strategy. The game mechanics could help engineer these streaks in a very transparent way without breaking player immersion. Further investigations of the hot hand hold significant promise for both psychology and game design.
篇目2，Game Developer Column 9: Playing the Odds
By Soren Johnson
One of the most powerful tools a designer can use when developing games is probability, using random chance to determine the outcome of player actions or to build the environment in which play occurs. The use of luck, however, is not without its pitfalls, and designers should be aware of the trade-offs involved – what chance can add to the experience and when it can be counterproductive.
Failing at Probability
One challenge with using randomness is that humans are notoriously poor at accurately evaluating probability. A common example is the Gambler’s Fallacy, which is the belief that odds will even out over time. If the Roulette wheel comes up black five times in a row, players often believe that the odds of coming up black again are quite small, even though clearly the streak makes no difference whatsoever. Conversely, people also see streaks where none actually exist – the shooter with a ‘hot hand’ in basketball, for example, is a myth. Studies show that, if anything, a successful shot actually predicts a subsequent miss.
Also, as designers of slot machines and MMO’s are quite aware, setting odds unevenly between each progressive reward level makes players think that the game is more generous than it really is. One commercial slot machine had its payout odds published by wizardofodds.com in 2008:
* 1:1 per 8 plays
* 2:1 per 600 plays
* 5:1 per 33 plays
* 20:1 per 2,320 plays
* 80:1 per 219 plays
* 150:1 per 6,241 plays
The 80:1 payoff is common enough to give players the thrill of beating the odds for a a big win but stillrare enough that the casino is in no risk of losing money. Furthermore, humans have a hard time estimating extreme odds – a 1% chance is anticipated too often and 99% odds are considered to be as safe as 100%.
Leveling the Field
These difficulties in accurately estimating odds actually work in the favor of the game designer. Simple game design systems, such as the dice-based resource generation system in Settlers of Catan, can be tantalizingly difficult to master with a dash of probability.
In fact, luck makes a game more accessible because it shrinks the gap – whether in perception or in reality – between experts and novices. In a game with a strong luck element, beginners believe that, no matter what, they have a chance to win. Few people would be willing to play a chess Grandmaster, but playing a backgammon expert is much more appealing – a few lucky throws can give anyone a chance.
In the words of designer Dani Bunten, “Although most players hate the idea of random events that will destroy their nice safe predictable strategies, nothing keeps a game alive like a wrench in the works. Do not allow players to decide this issue. They don’t know it but we’re offering them an excuse for when they lose (‘It was that damn random event that did me in!’) and an opportunity to ‘beat the odds’ when they win.”
Thus, luck serves as a social lubricant – the alcohol of gaming, so to speak – that increases the appeal of multiplayer gaming to audiences which would not normally be suited for cutthroat head-to-head competition.
Where Luck Fails
Nonetheless, randomness is not appropriate for all situations or even all games. The ‘nasty surprise’ mechanic is never a good idea. If a crate provides ammo and other bonuses when opened but explodes 1% of the time, the player has no chance to learn the probabilities in a safe manner. If the explosion occurs early enough, the player will immediately stop opening crates. If it happens much later, the player will feel unprepared and cheated.
Also, when randomness becomes just noise, the luck simply detracts from the player’s understanding of the game. If a die roll is made every time a StarCraft Marine shoots at a target, the rate of fire will simply appear uneven. Over time, the effect of luck on the game’s outcome will be negligible, but the player will have a harder time grasping how strong a Marine’s attack actually is with all the extra random noise.
Further, luck can slow down a game unnecessarily. The board games History of the World and Small World have a very similar conquest mechanic, except that the former uses dice and the latter does not (until the final attack). Making a die roll with each attack causes a History of the World turn to last at least three or four times as long as a turn in Small World. The reason is not just the logistical issues of rolling so many dice – knowing that the results of one’s decisions are completely predictable allows one to plan out all the steps at once without worrying about contingencies. Often, handling contingencies are a core part of the game design, but game speed is an important factor too, so designers should be sure that the trade-off is worthwhile.
Finally, luck is very inappropriate for calculations to determine victory. Unlucky rolls feel the fairest the longer players are given to react to them before the game’s end. Thus, the earlier luck plays a role, the better for the perception of game balance. Many classic card games – pinochle, bridge, hearts – follow a standard model of an initial random distribution of cards that establishes the game’s ‘terrain’ followed by a luck-free series of tricks which determines the winners and losers.
Probability is Content
Indeed, the idea that randomness can provide an initial challenge to be overcome plays an important role in many classic games, from simple games like Minesweeper to deeper ones like NetHack and Age of Empires. At their core, solitaire and Diablo are not so different – both present a randomly-generated environment that the player needs to navigate intelligently for success.
An interesting recent use of randomness was Spelunky, which is indie developer Derek Yu’s combination of the random level generation of NetHack with the game mechanics of 2D platformers like Lode Runner. The addictiveness of the game comes from the unlimited number of new caverns to explore, but frustration can emerge from the wild difficulty of certain, unplanned combinations of monsters and tunnels.
In fact, pure randomness can be an untamed beast, creating game dynamics that throw an otherwise solid design out of balance. For example, Civilization 3 introduced the concept of strategic resources which were required to construct certain units – Chariots need Horses, Tanks need Oil, and so on. These resources were sprinkled randomly across the world, which inevitably led to large continents with only one cluster of Iron controlled by a single AI opponent. Complaints of being unable to field armies for lack of resources were common among the community.
For Civilization 4, the problem was solved by adding a minimum amount of space between certain important resources, so that two sources of Iron could never be within seven tiles of each other. The result was a still unpredictable arrangement of resources around the globe but without the clustering that could doom an unfortunate player. On the other hand, the game actively encouraged clustering for less important luxury resources – Incense, Gems, Spices – to promote interesting trade dynamics.
Showing the Odds
Ultimately, when considering the role of probability, designers need to ask themselves ‘how is luck helping or hurting the game?’ Is randomness keeping the players pleasantly off-balance so that they can’t solve the game trivially? Or is it making the experience frustratingly unpredictable so that players are not invested in their decisions?
One factor which helps ensure the former is making the probability as explicit as possible. The strategy game Armageddon Empires based combat on a few simple die rolls and then showed the dice directly on-screen. Allowing the players to peer into the game’s calculations increases their comfort level with the mechanics, which makes chance a tool for the player instead of a mystery.
Similarly, with Civilization 4, we introduced a help mode which showed the exact probability of success in combat, which drastically increased player satisfaction with the underlying mechanics. Because humans have such a hard time estimating probability accurately, helping them make a smart decision can improve the experience immensely.
Some deck-building card games, such as Magic: The Gathering or Dominion, put probability in the foreground by centering the game experience on the likelihood of drawing cards in the player’s carefully constructed deck. These games are won by players who understand the proper ratio of rares to commons, knowing that each card will be drawn exactly once each time through the deck. This concept can be extended to other games of chance by providing, for example, a virtual “deck of dice” that ensures the distribution of die rolls is exactly even.
Another interesting – and perhaps underused – idea from the distant past of gaming history is the “Element of Chance” game option from the turn-based strategy game Lords of Conquest. The three options available – Low, Medium, and High – determined whether luck was only used to break ties or to play a larger role in resolving combat. The appropriate role of chance in a game is ultimately a subjective question, and giving players the ability to adjust the knobs themselves can open up the game to a larger audience with a greater variety of tastes.
篇目3，Emotions and Randomness – Loot Drops
by Chris Grey
Even though randomness can be used to greatly influence a player’s experience with a game, I haven’t seen many people put much thought into crafting it. We’ve all got war stories about a rare drop that took us hours to get, if not tens of hours. Gamefaqs is loaded with forum threads talking about the despair of the random drop. Even worse are the threads made by people who got the drop in one go, bragging and taunting the rest of the community, as if luck with the random number generator were something they actively controlled. For better or worse, randomness currently colors the play experience tremendously; why not talk about crafting it more actively from our side so that these experiences are less accidental?
Today, I’d like to focus on drops. I’m being a bit loose with the word because I’d like the ability to talk about both items dropped by defeated monsters and the monster taming process in Ni No Kuni, where enemies randomly become recruitable after you beat them up. I’m going to avoid giving hard numbers wherever possible; my aim here is to give a few heuristics about how randomness feels to the player.
First, let’s look at the way it’s done now. Typically, designers look at the in-game economic value of an item and decide how scarce it should be. More powerful items either appear later in the game or drop with a much lower constant percentage chance. The idea here is that players should feel some kind of sense of accomplishment when they obtain the item, or at least see how lucky they’ve been. Either way, it’ll bring the players to value the item, hopefully in accord with the designer. If players manage to get the drop in the average number of tries, if the designer has valued the item correctly, the player will typically have a similar valuation of and appropriate attachment to the item.
With a constant drop rate, here’s the graph that captures the farming experience. You might be expecting a bell curve here, but I want to illustrate something else born from this data. To do so, we’re going to change the vertical axis to reflect the following: assuming your players kill enemies until they get one of the items, here’s how long the player population will be farming.
Pay attention to the shape; the key point to notice here is that the graph never actually hits zero. That means some of your players are never going to successfully acquire the item, and they will have a terrible time trying to farm it because they will spend tremendous amounts of time doing a task the designer had only pictured them doing for a fifth of that time. Even the good feeling at getting the drop if they eventually manage to get it is generally overshadowed at this level. What’s worse, the time farming the item will skew a player’s value of it; most players will resent having to grind a massive amount of time if others did not have to, and they will focus their resentment on the item in question. Naturally, this resentment will also spill over to the game, and they will undoubtedly vent about how unfair the game is to anyone that will listen. These players will be overfarmed by the nature of the task, and this also ruins the otherwise carefully crafted difficulty curve. Their frustration can lead to quitting the game, and if this player was dedicated enough to stick it out that long, you probably alienated an incredibly passionate player. All this angst for a random drop that probably didn’t matter much in the bigger picture of the game.
…and the Queen save the poor souls who feel compelled to get the collect all random drops achievement. That synergy can quickly lead to tens of hours of despair and compulsion, if there are many items or especially rare items.
The problem with using averages to balance in this case is myriad. In the graph, notice that about sixty percent of players will receive the drop before the average number of attempts, and half of the population gets the item significantly before the average. This means most players won’t be seeing the event as many times as the designer probably designed for, and in reality, as any one player usually only goes through this process to get any one drop once, this will become the general consensus on how long the experience takes. Potentially a happy mistake, but it does diminish the feeling of effort the designer probably wanted the most players to feel. Of the rest, it can be expected that about twenty-five percent of players will take more than one and a half times the average to get the drop, and more than ten percent will take more than twice as long as average. If these players look to the rest of the player population, they will see their experience taking more than two to three times as long as the lucky half, respectively.
A designer with fixed resources would be drawn to craft the average experience when, in all honestly, it’s the fifty percent who finished significantly early and the twenty-five percent on the tail that need the attention more. Additionally, the latter will be the ones to really begin to see the activity for the warts it has. If the designer neglects the tail experience and has several different drops required or encouraged in game, the designer will eventually fail all of their players; the more drops the player needs, the more likely that the player will be in that tail at some point in the game. By focusing on the mathematical average experience, the designer is effectively neglecting seventy-five percent of their players on any single drop.
Other Kinds of Randomness – Escalating Drops
I want to present two simple alternatives. The first is an escalating drop rate. Each time a player fails to get the drop at the end of the event, the probability it drops next time increases. This probability caps at a guaranteed drop, and once the item drops, the probability resets to some level. It can reset at zero if you only ever want one in the game; it can reset at the initial probability if you want to make the experience to get another item take the same amount of time, more or less, as the first time; it can reset at a high probability if you want the item to be valuable now but easy to come by later.
Here is the new chart for this experience.
Notice how the line now hits zero on the right of the graph. It eliminates the abysmal experience we spoke of above. There will be unlucky players, but there’s a cap on the amount of time they’ll have to spend with their misfortune. There will still be war stories, but if designed well, the worst-case player experience can be designed for more easily, as it will more closely match the average. This can lead to those war stories that can enhance the player experience, as they feel like they struggled, but not much harder than the designer expected, which is nice way to give a bit of fiero. The angst of trying to get the item will always be fulfilled.
Additionally, if you set the initial drop rate low and let the growth rate accelerate, you’ll have fewer lucky people as well. This could help if you want to make the player master a challenging fight through repeated attempts to potentially get a powerful item. It’s worth noting that the player who gets the item on the first try will have their difficulty curve distorted, even though this case tends to be more subtle than the player who takes many tries. Empowerment is not a bad thing, but it can lead the lucky player to think the game is much easier than it is because of a fortuitous break. In general, the escalating drop approach will make the experience a little more uniform for any given player, and usually, it will be relatively invisible to them.
There’s a temptation here to wonder what would happen if you had to kill several of the same kind of monster before the item could even become available. If the player understands what’s happening, and they know that they will be fighting several times before they could even get a drop, that fighting suddenly becomes work. Gambling in this form works because the payoff is potentially always right around the corner. It cannot be understated how powerful this force is to motivate. Asking someone to do something fifty times makes it a chore, and times ten through forty will not be savored because after the initial novelty of doing it, you know it will not net reward any time soon. If a task could be rewarded randomly after any one attempt, more attention to detail and care will go into it from the player. The player will appreciate the experience more if they feel like what they are doing could pay off at any moment, not just some long time in the future.
Other Kinds of Randomness – Diminishing Returns
This is the invert of above. The idea is that the player has a limited number of chances to get an item in game before it goes away completely. Typically, the initial probability of the drop will start high, and either decrease with each failure, or the event will disappear after a set number of attempts. Either way makes the drop impossible to get after a certain number of chances.
This randomness is tricky to deal with as you are, in no uncertain terms, guaranteeing that a percentage of your player base will never get the item. It can be more humane than the traditional way as you are giving no option to exchange time (farm) for in-game value. If the item has significant value to the player, and the player knows the stakes, there will generally be a significant amount of urgency put on the outcomes, and a skilled designer could use this as a way to make a large emotional mark.
There is an unspoken rule with these kinds of drops. They can be gamed by reloading. As with permadeath mechanics, players can still get some tension from the outcome while using the load function to try as many times as they want to obtain the drop. If the ability to reload is removed, as it was in Demon’s Souls, then you may want to consider making the game short, but replayable, or having several different drops, only one achievable in the game. This can force players to actually have to adjust their playstyle based on what they got. Be careful with this kind of randomness, as it can easily inspire rage. You are very close to a core expectation of most players: “I am master of this game world, and given effort, I should not be deprived of anything I want.”
Some General Heuristics
Since most people aren’t taught well to think about probability, I wanted to give a few guidelines to work with.
When in doubt, make a simulation. When you use any type of probability distribution besides the constant percentage drop, you do not need to do a full mathematical workout of all cases. I highly recommend writing a program (or bribing your friend the coder to do so) to simulate the effects and generate graphs of how the system behaves when tested a huge number of times. That information, while not guaranteed to be exactly right, will be good enough, and the calculations required to get an exact answer are not worth the time required to compute them in most cases.
Generally speaking, the more random drops the player is compelled to farm, the closer their total experience will be to the average experience overall, and the more likely they are to face the worst case short term scenario sometime in their experience. Look at it this way: if everyone rolls fifty dice, it’s likely that the roll totals won’t differ much, and everyone will have probably rolled at least a couple of ones. The trap here is subtle: you cannot assume that poor luck will only affect some players in this case; it is almost guaranteed to strike everyone. Design accordingly.
The reverse of this is true, too. A small number of random drops in your game will mean that the player experience will be very uneven and different from person to person.
People tend to be terrible at estimating probabilities in their head, and dry spells leave bigger scars than lucky breaks feel good. The lower the probability, the worse the estimation ability. This can manifest especially with rare drops; people tend to start becoming frustrated long before the average if they know the drop is rare going into the session. Additionally, people will typically experience negative emotion for a significant portion of a farming session they consider to be long, while players who get lucky tend to move on quickly after experiencing the short-lived joy over a drop.
People conflate luck and skill quite often. It might be interesting to investigate mechanics that would reinforce this: increased drops for skilled play would allow those who have already mastered what the game is teaching to move on to something more interesting to them, while giving the less skilled players a way to both potentially improve and still get whatever item is at stake. This is something I’ve rarely seen, but I think would have huge potential.
Randomness is lovely, and if players buy into what is at stake, gambling can be used to craft incredible emotional experiences. It’s a shame that something so close to our hearts is so ill-understood because a little extra crafting of the probabilities behind the game mechanics could yield incredibly diverse experiences, both from game session to game session for any one player and between players. There is an amazing amount of potential, and I was only able to scratch the surface with a huge amount of text so all I can recommend for those who are willing to is: experiment.
As I didn’t get to show examples this time, I’m splitting them off into another entry. When it is done, I’ll link to it here.
篇目4，Statistically Speaking, It’s Probably a Good Game, Part 1: Probability for Game Designers
by Tyler Sigman
Quiz Time — Whoopee!
Q1) You are designing a new MMORPG, and you set a particular item — Orc Nostril Hair – to drop 10% of the time when a certain species of monster is killed. One of your testers reports back that he killed 20 of the monsters, and found the Orc Nostril Hair 4 times. Another tester killed 20 of the monsters and never found a single Orc Nostril Hair. Is there a programming bug?
Q2) You are designing a combat system for a game and have decided to include a critical hit mechanic. If the character lands a successful hit (say 75% base chance to hit), then you roll another hit check. If the second hit check is successful, the player will do double damage (2x). However, if this happens, you roll another hit check, and if that’s successful, then the damage is upgraded to triple damage (3x). As long as each hit check is successful, you keep making new checks, and the damage multiplier keeps increasing until a hit check is missed. What percentage of the time will the player get at least double damage (2x)? What percentage of the time will the player get quadruple damage (4x) or better?
Q3) You have decided to include a gambling mini-game in your latest magnum opus RTS-FPS-tamagotchi-sports hybrid game. The gambling mini-game will be very simple: the player can wager rubies on whether a coin flip will come up heads or tails. The player always receives even money on his winning bets. You will make the coin flip as fairly programmed as possible (50%/50%), but you will include an extra feature for the player: a list of the last 20 coin flip results will be shown on the right side of the screen. Should you beg the programmers to include any extra logic to prevent the player from taking advantage of this 20-flip history and using it to bankrupt your entire in-game economy?
We’ll attack the answers to these captivating questions at the end of this piece (if you’re still awake).
Game Designers – Renaissance People and Anti-Experts
Being a designer in this day and age requires a pretty wide variety of skills. Designers are the generalists of the development team, needing to bridge the gap between Art and Engineering, competently communicating with each — or at least competently faking it. A good designer requires a basic understanding of a lot of different things, because game design is a haphazard amalgamation of subjects.
It’s pretty common to hear designers debating or waxing poetic on the finer points of linear or non-linear storytelling, human psychology, control ergonomics, or the integration of non-interactive sequences; less often do you catch them mulling over the bare bones details of the hard sciences like calculus, physics, or statistics. Sure, there are the Will Wrights, determined to find fun in celestial goo and the dynamics of city traffic planning. Most, though, wince when equations start breathing down their necks.
Probability + Statistics = Cool
Probability (P) and Statistics (S) are two hard sciences that are hugely important to game designers—or at least should be! They go together like peas and carrots, but like those yummy veggies, they aren’t the same thing. Coarsely put:
Probability: predicts the chance that an event will happen
Statistics: draws conclusions based upon events that have already happened
Taken together, P and S allow you to perform amazing parlor tricks: you can both predict the future and analyze the past! What power! Remember, though: “With great power comes great responsibility.”
P and S are simply tools in your Designer’s Toolbox (you know—the one under your desk). You can and should use them to your advantage to design games that are better balanced and ultimately more fun!
Good and Bad Things Come in Threes
There are lots of scary, thick textbooks out there about P and S, and this discussion isn’t meant to be a substitute for you going out, doing your due diligence and reading them. (Note: falling asleep with them on your chest *does not* count as learning.)
What this series of 3 articles *will* do is give you a basic understanding of some key topics from both P and S. Specifically, we’ll focus on things that designers should give a rat’s behind about.
1. Part 1 (You’re reading it, buster): Probability for Game Designers
2. Part 2: Statistics for Game Designers
3. Part 3: Shaping Game Mechanics with Probability and Statistics
Remember, being a well-rounded designer doesn’t mean you have to be an expert in these things; you just have to be able to fool anyone else who isn’t!
TIP: Increasing your usage of “theoretically”, “codify”, and “taxonomy” will most certainly impress your coworkers towards these ends. Other disciplines love it when designers use big words! You can thank me for this wisdom later.
Ok, enough beating around the bush — on to the good stuff!
P R O B A B I L I T Y
Most games have one or more elements of probability incorporated into their base mechanics. Even chess requires the flipping of a coin to determine who takes white. Usually, we call probabilistic mechanics “random events”. Of course, the term random really might mean “completely random” or “sculpted random.” Regardless, whether you’re talking Texas Hold’em, World of WarcraftTM, or BombermanTM, random events are integrated into key game mechanics.
Probability: It’s not Just a Good Idea…It’s the Law!
You’ve probably heard the term “according to the laws of probability.” The key word in the phrase is “laws.” Probability is all about indisputable facts, not guesses. Ok, technically it’s all Probability Theory, but for the purposes of game design you can compute probabilities absolutely. When you roll a six-sided die, the chance of rolling a “6” is 1/6 = 16.7%—assuming a fair ‘throw’ and a perfectly manufactured die, of course. This 16.7% is not a guess, nor anything of the like. It is as good as fact*. Many of the most common thought errors that people make concerning probability have to do with the belief that probability is not based on laws, but rather on approximations or guidelines. Don’t fall into the traps! I’ll mention a few of the most common ones below, and try to draw some big DANGER! signs around them.
*I guess there might be quantum mechanical concerns that make the 16.7% not exactly fact. I mean, the die could suddenly warp out of existence or maybe your act of looking at it unfairly forces it to collapse its wave function (a severe inconvenience, to be sure).
Independent and Related Events
Let’s start our whirlwind tour of Probability’s Greatest Hits with a key distinction: whether events are independent or related. It’s vital to know before you can start calculating probabilities.
Independent Events: The chance of each event occurring does not depend in any way on what happened in the other event. For example, rolling a six-sided die (event #1) and then rolling it again (event #2) are independent events. The first and second rolls are not related in any way. The number you rolled in event #1 has absolutely zero influence on event #2 (see the “Fallacy of Equipartition”, later). Another example of independent events is drawing a card from a poker deck and then drawing a card from a second, totally different deck.
Related Events: the chance of each event happening is related in some way to the other event. For example, drawing a card from a poker deck (event #1) and then drawing a second card from the same deck (event #2). The chance of drawing a Jack on event #2 is affected by event #1—if you drew a Jack on event #1, then there’s a smaller chance of getting one in event #2 because there are less Jacks remaining in the deck.
One of the most useful bits of probability to know is how to calculate the chances of conditional events—that is, events that rely on other events occurring. For example, I used to play lots of old WarhammerTM tabletop games which are d6 based. According to the “to Hit” charts, if you had a somewhat unskilled warrior (with a low Weapon Skill) matched up with a superior enemy, you might have to roll a “6” followed by a “6” in order to hit. Just what is the chance of rolling a “6” followed by another “6”?
Well, first things first, you have to get the first “6” out of the way (a 1/6 chance). Then, you need to roll another “6” (a 1/6 chance again). Whenever one event depends on another’s success, you multiply the chances to get the cumulative chance of both occurring. In this case, it’s a 1/6 x 1/6 = 1/36 chance to roll a “6” followed by a “6”. (Note: If you have an irrational fear of rational numbers—har har—then you can always convert the fractions to decimals by using your calculator. In this case, 1/36 = .028 = 2.8%)
Armed with this newfound power of Conditional Probability, it’s very easy to calculate the chances of crazy dice throws. What are the chances that you can roll four “6”s in a row? The answer is 1/6 x 1/6 x 1/6 x 1/6. Or more simply, (1/6)4 = .0008 = .08%. How about ten “2”s in a row? (1/6)10 = AnIncrediblySmall%.
Ratcheting up the difficulty, how about the chances of rolling a “3” or above followed by a “5” or above? It’s just 4/6 x 2/6 = 8/36 = 2/9 = 22.2%. Now we’re rockin’ the free world!
Superstition and the Fallacy of Equipartition — aka “The Gambler’s Fallacy”
One of the most common and widespread thought errors that people make concerning probability is blurring the line between independent and related events. This typically takes one of the following forms:
Mistake 1: Believing that a “5” is less likely than normal to appear again because the last dice roll was a “5”.
Mistake 2: Believing that a “6” has a very high chance of being thrown because 10 rolls have gone by without a “6” being thrown. Dressed in another outfit, this is believing that “red” is due on a roulette wheel because it has been several spins since the last “red” hit.
Mistake 3: After flipping a coin 10 times and getting 8 heads and 2 tails, believing that the next 10 flips will have more tails then heads in order to “even out.”
These all loosely fall under the appropriately well-named “Gambler’s Fallacy.” Basically, this is just the name for confusing independent and related events. Another name for this fallacy is “I just lost all my money at roulette because the Laws of Probability defied me Fallacy.” It is closely related to the lesser-known “Why do casinos allow me to keep a written log of the recent roulette spins — surely they know that I’ll be able to figure out the pattern and beat the wheel Fallacy?” (Note: that last fallacy typically is followed quickly by the previous one.)
Don’t fall into these traps! Rolling a die multiple times or spinning a roulette wheel are independent events, pure and simple. Let’s examine each of the above mistakes more closely:
Mistake 1: The chance of rolling a “5” on a d6 is 1/6 = 16.7% This never changes. It doesn’t matter if you’ve thrown eight “5”s in a row or haven’t seen a “5” since Gilligan’s Island premiered. 16.7% is still the magic number. “Dice don’t have memory” is a common phrase overheard…and it’s correct!
Mistake 2: Same as above. The chance of rolling a “6” or hitting “red” has absolutely nothing to do with the rolls or spins that came before. Roulette wheels don’t have memories either (unless they are actually magnetized and “Vinnie the Spinnie” is making sure that your number never comes up).
The most common argument that people make for mistakes 1 and 2 goes something along these lines:
The Law of Averages Got Vetoed
Mistake #3 (from back before Scene 24b) is a similar, but extended error: believing that over the long run, everything will “even out” — the Law of Averages. It’s true that, out of 1000 flips of a coin, you’d expect to see roughly 50% heads and 50% tails. But there is no such thing as a “correction.” If you flip a coin ten times and get 8 heads against 2 tails, there is no global essence or power that is going to squeeze a few more tails into the next 10 flips. You would be making a grave philosophical error to assume that “tails are due”, and an even graver error to put big money on it. Peter Webb has an excellent short discussion on this subject at his website (see recommended reading at the end of this article).
The gist is, if you flip a coin 1 million times, you’ll expect the heads and tails split to be close to 50%. But don’t expect the NUMBER of heads flips to equal the NUMBER of tails flips — in fact, it’s very likely that they will be off by hundreds or even thousands. Remember, you could have 10,000 less heads than tails and the division would still be very close to 50%/50% (49%/51%, to be exact). So don’t put money on assuming that an 8-to-2 heads lead (+6 heads) will be corrected as you flip more coins! It’s very likely that even if the heads/tails split will be close to 50%/50% in the long run, the actual difference between number of heads and number of tails will probably grow as the total number of flipped coins grows.
It’s easy to find formulas that will help you calculate the chances of independent or related events. Sometimes, though, it can be very, very difficult to calculate more involved probabilities. One trick that you can pull out of your hat to save the day is the concept of “converse probability.” To calculate converse probability, instead of trying to determine the chances that something will happen, you instead calculate the chances that something won’t. Then, you subtract this number from 1.0 (100%) to get the probability that you are looking for.
Converse Prob 101: Easy Example
You are about to roll a six-sided die. What are the chances that you’ll roll a “6”? Although we already know the answer, we’ll use converse probability to verify it. The chances you won’t roll a “6” are 5/6 (5 out of 6 of the die sides are not a 6). Therefore, the chance of rolling a “6” is 1 – 5/6 = 1/6, or the familiar 16.7%. In other words, if you won’t roll a “6” 5 out of 6 times, then you will roll a “6” 1 out of 6 times. That almost makes sense!
Converse Prob 201: Flush with Excitement
Here’s a situation where converse probability really is a money saver. It’s Texas Hold’em, and you are four cards to a heart flush with two cards to come. In other words, if a heart comes on the Turn or the River, you’ll complete your flush**. What are the chances this will happen?
**Given the choice between the two, I recommend completing your flushes on the River — this has the greatest chance of psychologically destroying your opponents and sending them into apoplectic fits. Nobody likes getting Rivered.
It’s very easy to calculate the chances of a heart coming on the next card. There are 9 hearts remaining “in the deck” that have not been flipped up yet (13 to begin with minus the 4 already showing between the flop and your hand). There are 47 cards total in the deck (52 minus the two in your hand and the three flopped on the board). Therefore, the chances of flipping a heart on the next card are 9 out of 47, or 9/47. If that card isn’t a heart, then the chance of flipping a heart on the following card is 9/46 (there are still 9 hearts remaining, but one less card total in the deck).
Great, we’re off to the races! Only problem is, how can we easily calculate the total chance of flipping a heart, accounting for both cards? It would be easy to make the mistake of assuming that it would be 9/47 + 9/46. Not true, however. This is the same mistake that can lead you down the dark path of believing that the chance of rolling a “6” on a six throws of the dice is 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1.0 = 100% = SureThingTM. Unfortunately, there is not a 100% chance of rolling a “6” on six throws of a die***.
Turns out that the solutions to both of these problems are made easier by using converse probability. We must ask “what are the chances we won’t draw a heart?” For the first card (the turn), the answer is (47 – 9)/47 = 38/47. For the second card (the river), the answer is (46 – 9)/46 = 37/46. From a study of conditional events (see earlier in this article), it’s easy to calculate the chances of BOTH of these events happening. In other words, we must calculate the chance that no hearts are drawn on either card. This is just the product of 38/47 x 37/46 = 65.0%. Since we are actually interested in the chances of making the flush, we just subtract this result from 1.0 to get it. 1.0 – .65 = .35 = 35% So there is a 35% chance of drawing the flush. Now do you go all-in?
***Note: the answer to the little dice problem is figured out the same way. The chance of rolling at least one “6” in six throws is found by looking at the chance of rolling no “6”s. On each throw, the chance of no “6”s is 5/6. Cumulatively, in order for no “6”s to come up in six throws, the chance is just the product of all six throws: 5/6 x 5/6 x 5/6 x 5/6 x 5/6 x 5/6 = .33 = 33%. So, the chance of at least one “6” coming up is 1.0 – .33 = .67 = 67%. Thus, you should see at least one “6” on six throws about 2/3 of the time. Now go use this to make money off somebody.
I didn’t mean to get all “mathy” above, and I was a little sneaky with my blitzkrieg treatment of Hold’em odds. The important thing to remember about converse probability is that sometimes it’s much easier to figure out the chance of something NOT happening than it is to figure out whether it WILL.
An Aside on Random Number Generators
Another thing all digital game designers should know about probability: random number generators are not random! Random number algorithms require a “seed number,” which is a base from which the algorithm can get all medieval on itself and do gyrations that ultimately result in a seemingly-random number. Most of the time, programs sample the CPU clock time or something similar to use as the seed number — this helps the algorithm be pretty darn random. But for high-intensity games with tons and tons of random number generations, sometimes that’s not random enough. Take for example online poker providers. Players (who often gamble for real money) need to know unconditionally that there are no underlying patterns to the random numbers that control card shuffling. In extreme cases like this where money is riding on the outcome, programmers must get super-fancy and start doing things like using CPU heat and entropy as seed numbers instead of clock times.
The take-home here is just that digital games don’t have truly random numbers. Most of the time, that’s fine, but if your game crunches insane amounts of numbers, then beware of patterns.
Quiz Show Redux
If you’ve read this far, then we’re both exhausted. Hopefully, though, we have also developed a nice golden-brown brain tan. So let’s revisit the questions from the beginning of this article.
Q1) Orc Nostril Hair Drop Rates
It’s too early to panic. Never panic unless you are sure you should panic. If you are sure you should panic, then panic, and panic well.
In this case, both testers’ results are certainly within the realm of probability. If there is a base 10% chance of finding the Orc Nostril Hair (ONH) on each monster-slaying, then the chances of finding at least 4 ONH in 20 tries is 13.3% Where did I get that number, you might ask? Well, I cheated and used an advanced concept called Binomial Distribution, but sadly (or happily?) it is beyond the scope of this article.
The chance of finding zero ONH at all through 20 tries is determined through converse probability:
10% chance of finding item on each kill means 90% chance of not finding it (0.90).
Chance of not finding it through 20 kills = (.90)^20 = 12.2%
So there is about the same chance of finding 4 ONH in 20 tries (13.3%) as there is finding zero ONH in 20 tries (12.2%). Not yet cause for justified panic.
To really determine if you should panic, you need more info. You need lots of data points from your testers in order to draw an informed, statistically-based conclusion (OK, I’m jumping ahead to part 2. Indulge me.)
Let’s say you collect info on 100 play sessions, each of which involve 100 kills. That’s a respectable amount of data. If out of those sessions, players are finding ONH a lot less or a lot more than 10% of the time, then you probably have a bug that is affecting your reward rates. In that case, panic with all haste! Tip: Sprinting around the office screaming “No!” generally gets quick results.
Q2) 2x-3x-4x+ Critical Hits
The chances of doing at least 2x damage are found by the conditional probability of hitting twice in a row:
Chance of 2x or better = 0.75 x 0.75 = 56.3%
Chance of 4x or better = (0.75)^4 = 31.6%
Wow. Players will do 4x or better damage almost 1/3 of the time. Fix your system, dude/dudette! Either drop the base hit percentage or make the successive critical levels harder to achieve.
Q3) Will Flip Coins for Money
This question is a silly trap, not-so-elaborately laid. First you give the player help by showing them the last 20 flips, then you need to shore-up your system to present exploits. Sheesh!
The answer, of course, is that providing the player with this 20-flip history changes nothing about the fact that each coin flip is a 50/50 proposition****. Let the player wreck himself with the Gambler’s Fallacy.
Heck, I even recommend paying out less than even money every time a player bets “heads” after 2 successive “tails” results. Just tell ‘em you are adjusting to their unfair advantage of knowing “heads” is due. They’ll believe you, they will…
****Natch, discounting any flaws in your random number generator that simulates the coin flip.
What are the Chances that this Article was Interesting?
I may be a gambling man, but I won’t dare to give odds on that.
If you survived the last few thousand words or even enjoyed them, stay tuned. In Part 2, we’ll explore the “Two-Drink Minimum” science of Statistics. And finally, in Part 3 (the riveting conclusion), we’ll look at the anatomy of a number system and explore how your choices as a game designer can sculpt a game’s mechanics into a true work of art. No, really!
篇目5，Statistically Speaking, It’s Probably a Good Game, Part 2: Statistics for Game Designers
by Tyler Sigman
If you’re reading this, then chances are you also read Part 1, “Probability for Game Designers.”
If you haven’t read it, you really should, and that’s not to say it is full of good stuff (the article is tripe, actually). I just recommend reading it because if you don’t, you might be unprepared for the silliness that may ensue during this serious *ahem* and erudite *cough* discussion of statistics.
This article focuses on a few select statistical topics that I believe should be understood by game designers. In particular, statistics really is useful and important for system designers, mechanicians, balancers, and other subclasses of designer that are usually relegated to steerage.
Disclaimer taken care of, let’s move on to the fizzy stuff!
Statistics: A Two-Drink Minimum Science
Although heavily grounded in mathematics, statistics is…well…weird! Seriously – if you ever have to start dealing heavily in two-sided confidence intervals and Student’s T-tests and chi-squared tests (or anything else squared, for that matter), it can get a little hard to digest at times.
You see, people like me really prefer physical metaphors. I’ve always liked physics and mechanics, because a lot of the time you can give yourself a reality check simply by analyzing reality. When you’re calculating the rate and direction at which an apple falls from a tree, you can reality check it in your head if your result says the apple should shoot off straight upward at 1,224 MPH.
At its best, statistics is understandable and rational; at its worst, it’s a little strange. Hence, I recommend libations and togas for any involved statistics discussion. I have asked the fine editors at Gamasutra to provide such togas and an open digital bar. What, didn’t you get your passcode? Hmmm, weird.
In any case, the topics in this article aren’t weird at all. For the most part, they are tangible, crunchy bits of statistics that you can develop gut feels for.
Statistics: The Dark Science
Statistics is, of all the sciences, the one that is very prone to misuse by the Forces of Evil. That is, if you had to attribute one science to the villain you are creating for your new book (you are writing a book, aren’t you?), you could do much worse than pick statistics. You could also give him a cape, dress him in black, and refer to him as “The Spider” or “Mr. Jones”, but I digress.
The reason that statistics can be loosely compared to villainy is that, used improperly, this branch of science can be called upon to infer all sorts of relationships that aren’t actually meaningful or even true (see the end of this article for an example of what I mean). When in the hands of politicians and other ne’er do wells, this can guide big decisions. Big decisions based upon inaccurate conclusions are never good.
All this is to say, statistics is incredibly useful and helpful when used properly. But like any stuperpower, it can be applied in nefarious ways, or even just plain dumb ways.
Statistics – What’s All The Fuss About?
I was going to crack my knuckles and write a tight summary, but then noticed that Wikipedia already had something that was darn near poetry. Here it is:
Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used for making informed decisions in all areas of business and government. (Courtesy Wikipedia.org)
That’s actually a very moving passage. In particular, the last bit is the tour de force of the paragraph:
…it is also used for making informed decisions…
Of course, the writer forgot to add “in game design,” but we can forgive him his condescension towards our burgeoning industry.
Here’s my own try:
Statistics is a mathematical science that deals with collecting and analyzing data in order to determine past trends, forecast future results, and gain a level of confidence about stuff that we want to know more about. (Courtesy Tylerpedia)
And if I were to modify it for Game Design, I would say (and am, in fact, saying):
Statistics can help you shine a flashlight upon your broken mechanics and shattered design dreams. It does this by giving you actual hard, scientific data to support meaningful design decisions.
What Do We Need to Know?
Statistics, like any hard science, is deep and complex. Like the tour of Probability in Part 1, this article only touches on a few selected topics that I, in my unlimited hubris, have deemed Important Enough to Know®. (Yep – unlike the many TMs I throw around, this one is so potent it’s registered!)
Pop Quiz Again
I’m sad to say that I have resorted to another test. Don’t hate the Quizza, hate the Quiz.
Q1a) Focus testers have just finished playing through a level in your new snail racing game “S-car GO!” Twenty testers played, and your are informed that the lap times came back in a range from 1 min 24 seconds at the low end to 2 min 32 seconds at the high end. You were expecting an average time of 2 minutes or so. Was the test a success?
Q1b) You collect more data for the same level, do some analysis, and find that the stats are: mean = 2 min 5 sec, standard deviation = 45 sec. Should you be satisfied?
Q2) You design a casual game that will surely soon be the talk of soccer moms everywhere (an admirable goal). In final QA, you release a beta build and then take data on a whole bunch of trial sessions. Over 1,000 play sessions are recorded, with over 100 unique players (some players were allowed to play repeated sessions). Crunching the data shows a mean score of 52,000 pts with a standard deviation of 500 pts. Is the game tuned up enough to release?
Q3) You design an RPG, and then collect data on how fast it takes new players to progress from level 1 to level 5. The data comes in as follows: 4.6 hrs, 3.9 hrs, 5.6 hrs, 0.2 hrs, 5.5 hrs, 4.4 hrs. 4.2 hrs, 5.3 hrs. Should you calculate the mean and standard deviation?
Populations and Samples
The base of statistics is the analysis of data. When dealing with data, there are two main terms that you need to know:
1.Population: the entirety of a field for which measurements are to be taken. The population is arbitrary, and is dependent only on what you wish to measure. For example, say you want to know what people think about a particular issue. Your chosen population could be all of the people on earth, all of the people in Iowa, or just all the people on your street.
2.Sample: a portion of the population for which measurements are actually taken. For very obvious reasons, it’s often too hard to gather data for an entire population. Instead, you gather data for a portion of the population. This is your sample.
Accuracy and Sample Size
The strength of a statistical conclusion is extremely sensitive to the size of your sample.
In a perfect world, you’d always like your sample size to be equal to your population–that is, you want to collect data on the entirety of whatever matters to you! Because anything less means you have to infer trends (a mathematical inference, but an inference nonetheless). Furthermore, the more data points, the better; you’d rather have a giant population than a tiny one.
Marketers and politicians would give their left brains to get a sample that is equal to their (large) population of interest. For example, instead of polling 10,000 junior high school kids to get an idea of how they feel about Fruit Roll-Ups®, imagine if they could poll *every junior high school kid*. Failing that, polling 1,000,000 would be super. Failing that, 100,000 would be dang nice. Failing that…okay, 10,000 will do.
It is for reasons of time and money that studies are performed on samples rather than entire populations.
1. The Common Sense Rule of Statistics: mo is bettuh
You can’t predict a trend with one data point. If you know I like chocolate ice cream, you can’t draw any meaningful conclusions about what all Sigmans like. Now if you ask many members of my family, then you might be able to draw a reasonable conclusion about what the rest think…or at least know *whether* you can draw a reasonable conclusion. Ain’t stats fun?
Population Explosions and Wide Distributions (BEEP! BEEP!)
For reasons that only The Big Guy can explain, many things in life tend to follow similar patterns, or distributions.
One of the most common is the aptly-named “normal distribution.” That’s right, anything not matching this is abnormal, and therefore weird (and should be shunned appropriately).
The normal distribution is also known as a “Gaussian” distribution, primarily because “normal” doesn’t sound scientific enough.
The normal distribution is also commonly called a “bell curve” because, well, just look at the durned thing, will ya!?
The distinguishing characteristics of a bell curve distribution are that most of the population are clustered closely around the mean, or average, value, and comparatively few are scattered at the extremes (high or low). This middle-clustering leads to the bell-curve appearance; the highs and lows are the flange of the bell.
We see the bell curve around us in a million different things. If you measured the heights of all the people in your city, they’d probably match this distribution. That is, a tiny few would be super-abnormally short, a tiny few would be super-Yao Ming tall, and a great many would be within a few inches of the average.
The bell curve typically holds true whenever you are looking at people’s skill levels, too. Take sports – a tiny few are good enough to play professionally, a great many are good enough to get by, and a tiny few are so bad that they don’t get picked to be on teams (like me).
The normal distribution, despite being swell, isn’t the only distribution around. It’s just amazingly common.
For examples of some additional distributions that are directly related to gaming and game design, just take a look the probability distributions of dice throws, in this case a d6 and then a 2d6 throw:
In part 3 of this series, which should hit Gamasutra shelves around 2010, I’m going to spend a bunch more time talking about these dice distributions. For now, all I’m going to say is that the first one looks nothing like a bell curve, whereas the second throw is starting to resemble one (but still isn’t quite there yet).
Means to an End
Consider this tiny section an intermission embedded within an otherwise tedious article. This tiny, self-referential section serves only one purpose in life: to remind you of what a “mean” is. This tiny, self-referential, and pedantic section would like to passively remind you that a mean is the mathematical average of a set of data.
This tiny, self-referential, pedantic, passive, and well-meaning section hopes that you take something meaningful away from reading it; for it is now that this tiny, self-referential, pedantic, passive, and pun-throwing paragraph must end.
Variance and Standard Deviation
Variance and standard deviation are very important to understand, and have a lot of tangible value. Aside from helping us draw valuable statistical conclusions, these terms enable us to speak a lot more intelligently about distributions. Instead of saying “a great many data points cluster about the middle”, we can say “68.2% of the sample falls within one standard deviation of the mean.” Chicks dig that speak; guys dig that speak; heck, who doesn’t dig that speak?
Variance and standard deviation are related to each other, and they both measure the same thing: data scatter. Intuitively, a high variance or standard deviation means your data is all over the place. When I play darts, I get a high variance in my throws.
Variance and standard deviation can be easily calculated from any set of data that you have. I’d put the equations in here, but that would break my “don’t sound like a textbook” rule. So instead of an equation, here’s a description:
Standard Deviation: the average amount by which data points in the sample or population differ from the mean. Standard deviation is represented by the Greek letter σ (sigma)
In other words, say you test 100 people on how long it takes them to complete Level 1 in your newest game. Let’s assume the average (mean) of all the data is 2 minutes 30 seconds. Now assume the standard deviation calculates out to be 15 seconds. This standard deviation indicates that the grouping or “clumping” of the play sessions. In this case, it’s saying that on average, play sessions are within ±.25 minutes of 2.5 minutes. That’s pretty consistent.
What does this mean and why do you care? Easy. Pretend that instead of the above results, you got these results:
Mean = 2.5 minutes (same as above)
σ = 90 seconds = 1.5 minutes
So here we have the same mean but a vastly different standard deviation. This set of numbers means that you have much more scatter in the play times. On average, play times are about 90 seconds off of the mean play time. Given that the mean play time is only 2.5 minutes, that’s huge! And it’s probably not good to have that much scatter, for various game design reasons.
It would be much different if you were talking about a standard deviation of 90 seconds (1.5 minutes) on play times of 15 minutes.
Consistency is measured by a small standard deviation. Ratio your standard deviation against your mean to get a good warm-fuzzy number. In the first example, 15 sec / 150 sec = 10%. In the second, 90 sec / 150 sec = 60%. A standard deviation of 60% is bigggggg with indulgently repeated g’s. In the third, 90 sec / 900 sec = 10% again…respectable.
This is not to say that a large standard deviation is *always* bad. Sometimes as designers we want a large standard deviation in whatever we’re measuring. But a lot of times it’s bad, because it represents a lot of scatter and variability.
The important thing is that calculating standard deviation will tell you a lot about your game/mechanic/level/etc. Examples of useful things to measure standard deviation for:
1.Level play times
2.Whole-game play times
3.Number of combat rounds it takes to defeat a typical enemy
4.Number of coins collected (games with small Italian plumbers)
5.Number of rings collected (games with fast, blue hedgehogs)
6.Times controller is thrown at screen during your tutorial
Margins of Error
Margins of Error go hand in hand with statistical conclusions. Think of every Gallup Poll you’ve ever seen; there is always a margin of error expressed, such as ±2.0%. Because polls are using samples to estimate a population, there can never be 100% confidence (see later in the article). Margin of Err.0or indicates how accurate the results are. It is absolutely vital to know Margin of Error whenever you are talking about a population bigger than your sample.
If you take data on your entire population, then theoretically you don’t need a Margin of Error – you already know all the data! For example, if I ask everyone on my street whether they prefer Chess or Go, then I don’t need a Margin of Error as long as I am just reporting about people on my street. But if I want to draw a conclusion about everyone in my town based upon the data points from my street, then I have to calculate Margin of Error.
The bigger your sample size is, the smaller your Margin of Error. Mo data is bettuh.
You can use inferential statistics to draw conclusions about future data. One useful trick is the calculation of confidence intervals. Conceptually, confidence intervals are closely related to standard deviation, and are basically a mathematical way of saying how certain we are that a given piece of data will fall in a specified range.
Confidence interval: a mathematical way of saying “we can guarantee with A% confidence that B% of the data will be between values C and D.”
That’s a mouthful. But it’s useful to know, with a specified amount of confidence, what a value is likely to be. For a good example, I’m going to step back into my previous career for a blissful yet ultimately unsatisfying moment:
I used to do stress analysis and design of aircraft bits and bobs. If you know, or need to know, anything about aircraft – and commercial aircraft in particular – it’s that it is the most regulated form of transportation that exists. People don’t like it when wings fall off of planes. ‘nuff said.
One of the methods we engineers use to keep said wings on said planes is designing to a very high confidence interval of material strength properties. A typical confidence interval used for aircraft design is the “A-basis allowable”, which means we are 95% confident that 99% of the values in any given shipment of a specified material fall above a certain value. Then, we design to that value against the worst possible air conditions, and then finally apply a big factor of safety on it. Gotta be sure.
Confidence intervals are very informative and useful whenever you *really want to know* what kind of data values to expect. Fortunately, games are not typically a matter of life and death, but if you are trying to balance an (unpatchable) console game, you probably want to have more than gut feel and intuition to go on. Calculating confidence intervals could be used to give you hard facts about how your game plays, and whether there are obvious exploits.
Whenever you want to calculate good confidence intervals, the ol’ standby rule of statistics still holds true: mo is bettuh. The more data points you have in your sample, the better your confidence interval calculation will be.
You Can Never Be Sure
This brings up another rule of statistics (and probability, actually):
100% Does Not Exist: You will never achieve a confidence interval of 100%. You can never guarantee through inferential statistics that a predicted data point will be of a certain specified value.
The only sure things in life are death, taxes, and the inability to find the last Yeti Hide you need when trying to complete a World of Warcraft quest. Accept these facts and move on.
I mentioned earlier that statistics works as a skill of villainy. To illustrate why, I wrote this short, bullet-form love poem:
Sonnet 1325: Beautiful statistics, let me count the ways that I abuse and misuse you.
2.Not stating confidence intervals
3.Discarding valid conclusions because you don’t like them
4.Drawing conclusions based upon flawed or influenced data
5.Sportscaster errors – blending errors of probability and statistics
6.Drawing conclusions based upon unrelated factors
People misunderstand statistical statements all the time. I know, it’s hard to believe.
Not Stating Confidence Intervals or Margins of Error
Confidence intervals and margins of error are vital pieces of information. There is a huge difference between saying 43% of PC owners have purchased a downloadable game in the past 30 days (Margin of Error 40%) and the same statement with a MoE of 2%. When MoE is left out, always assume the worst. Remember, small sample = high MoE.
Discarding Valid Conclusions Because You Don’t Like Them
When used properly, statistics don’t lie. But people lie to themselves all the time. We see this a lot in politics, where statistical studies will be ignored simply because the conclusions don’t match those that were hoped for. Same thing sometimes happens with focus groups. Of course, we also see statistics misused terribly in politics, so it’s a wash, I guess.
Drawing Conclusions Based Upon Flawed Data
This one happens a lot, especially in market research. Your statistical conclusions are only as good as the data you make them from. If the data is flawed, then the conclusions are worthless. Flawed data can come in a variety of forms, with causes ranging from honest errors to severe manipulation. Asking loaded questions is one easy way to get flawed data that supports whatever conclusion you were hoping to make anyway. “Do you prefer Product X, or that crappy Product Y that only idiots use?” quickly leads to seemingly bullet-proof statements like “95% of consumers prefer Product X!”
Sportscasters are the shamans of our day. They take a little statistics, a little probability, a little gut feel and then mix them together to make something terrible. If you ever want to see a bunch of statistics thrown around with tenuous conclusions that typically have no basis, just watch a football game.
For instance, an announcer might say that “Team A hasn’t blocked a kick against Team B in the last 5 games.” The dangling conclusion is that Team A is less likely to block a kick than if they had done so in the last 5 games versus Team B. But you could say the same about the reverse–maybe they are more likely since they haven’t blocked one in a while!
The truth is, there isn’t enough information to say either one. And it’s probably more a matter of probability, anyway. Does the chance of blocking a kick really depend on whether one was blocked the game before? They are probably independent events, unless there are recognizable interrelated factors.
This is not to say that all sports conclusions are flawed. Statistics is very important to baseball, for instance. Statistical analysis sometimes guides what pitch is thrown or what the batting lineup will be.
It all comes down to data: when you have a lot of data, you get better statistical conclusions. Baseball supplies a lot of data: almost 200 games per season! With football, there almost just aren’t enough games to go around. Margins of Error are bigger. I’m not exactly saying statistics is never useful for football…it is just harder to mine useful, contextual data.
Drawing Conclusions Based Upon Unrelated Factors
People misunderstand statistical statements all the time. Rather, using compared relationships, it’s easy to infer deeper relationships that don’t actually exist. My all-time favorite example of this is the well-known Pirates vs. Global Warming graph featured in the CHURCH OF THE FLYING SPAGHETTI-MONSTER’S Open Letter to the Kansas School Board:
Please, for the love of all that is statistical, go look at the graph contained in that article. PLEASE, I BEG YOU!
Please, Can We Just Bookend the Quiz and Be Done?
Okay, okay, I hear you.
Q1a ANSWER – Level Times
The answer to this one is easy: you haven’t been given enough info to calculate the average yet. Just because the values ranged from 1:24 to 2:32 doesn’t mean they average out at 2 minutes. (Those two numbers average to 1.97 minutes, but we don’t know the other 18 results!) You need to know all 20 results to calc the average, and you really ought to calc the standard deviation as well…see below.
Q1b ANSWER – Level Times Part Deux
Okay, in this case you probably shouldn’t be satisfied because the standard deviation is pretty high…over 40% of the mean. This sounds like a bit too much variation in your level. There is potentially a sizable exploit that skilled players are using to their advantage. Alternatively, you might be punishing less-skilled players too much. As the game designer, you ultimately have to be the judge as to whether these results (high variation) are intended.
Q2 ANSWER – Soccer Moms
Stats only gets you part of the way there; you still need game design smarts. In this case, the score grouping is *way too close*…to have a standard deviation that low (500 / 52000 = 1%) means you are getting hardly any score variation, which means in turn that differences in player skill aren’t really mattering in the end game result. Therefore, players will most likely be turned off because they won’t see much of a progression in their scores as they get better at the game.
Here’s a situation where you’d really love to see a much higher standard deviation, because that hopefully shows that increased skill leads to increased scores. In other words, your current game scores the same no matter who plays it.
Q3 ANSWER – Play Times
This one is sorta tricky and underhanded but illustrates an important point about data collection: you need to watch out for obviously bad data. That one value, 0.2 hrs, looks suspiciously like an error. Could be a typo, could be an equipment malfunction, who knows. In any case, you should either convince yourself without a doubt that the 0.2 hrs is a valid data point before doing any calculations with it, or just throw it out and perform your calcs on the remaining data points.
Insert Other Cool Stuff Here
In efforts to keep this article under 723 pages, I have to skip over many other intriguing topics. Suffice it to say that a good understanding of statistics will help not only your game design, but your consumer decisions, voting decisions, and financial decisions. I’m 23.4% sure that at least 40% of what I just said is true.
As a designer, statistics is most useful when crunching data from a set of recorded play sessions (your sample), and trying to form conclusions about a larger field of unrecorded play sessions (your population).
Learn By Doing
For example, in the game I just finished, we recorded data from play sessions and then set challenge levels in the game based upon the mean and standard deviation values from those recorded data. We set Medium difficulty to be equal to the mean values, Easy difficulty to be equal to the mean minus a certain amount of standard deviations, and then Hard difficulty equal to the mean plus a certain amount of standard deviations. Had we collected much more data, it would’ve actually been accurate!
Just like probability theory, statistics becomes more and more useful the bigger and bigger the scope of your project. A lot of the time, you can fumble your way through without applying any formal theory in either case. But the bigger your game, the bigger your audience, and the bigger your budgets, then the more there is to risk from embedded flaws in an unbalanced, seat-of-the-pants designed game.
Stats, like probability, won’t do your game design work for you. It’ll just help you do it better!
The Long Road Ahead
In the rousing conclusion to this series, I’ll be taking bits from parts 1 and 2 and then putting them together in ways that actually have some relevance to games. Or I’ll croak trying!