游戏邦在:
杂志专栏:
gamerboom.com订阅到鲜果订阅到抓虾google reader订阅到有道订阅到QQ邮箱订阅到帮看

对于新玩家流失和资深玩家流失情况的预测

发布时间:2012-09-01 14:06:20 Tags:,,

作者:Dmitry Nozhnin

我曾在之前的文章阐述了我对于新用户(刚注册游戏)流失情况的预过程,即根据他们头1,2天的游戏情况所收集到的数据进行猜测。而在这一范围的另一端是那些愿意在游戏中投入数月时间的资深玩家,及关于他们最终放 弃游戏的多种原因。虽然猜测他们离开游戏的想法很容易,但是在本篇文章中我们将进一步分享我们的数据挖掘方法论。

技术支持

我们的数据集中有来自3万8千名资深玩家长达6个月的游戏记录。

定义流失时刻

明确新玩家的流失时刻非常简单,他们总是会在进入游戏几分钟或几小时后离开。所以我们可以很容易确定这些玩家最后的游戏时间,并因此确定这种流失元素的数据挖掘模式。而对于资深玩家,我们便需要经过多次迭代才能 准确地定义流失情况。我们的第一种假设是:玩家只是一时喜欢游戏,之后他可能就会选择离开游戏。用绿色去标记他们的游戏时间,结果将如下图所示:

image 01(from gamasutra.com)

image 01(from gamasutra.com)

 

我们采取较为直接的方法去明确流失时间,即根据玩家最后一次的游戏时间。但是实际上却不是这么简单;大多数玩家的游戏时间如下:

image02(from gamasutra.com)

image02(from gamasutra.com)

 

我们可以看到8月25日便是玩家最后一次的游戏时间,但是它是否就等于流失点呢?我们可以从图中看到,8月16日前玩家已经连续七天未玩游戏了,或者在7月31日后玩家也是连续7天未曾打开游戏。我们尝试了各种假设,但 却发现简单的假设都是不成立的。以一种简单的方法去明确流失情况(即将某一天当成是玩家的最后一天游戏时间)只能获得65%的精确度。

手动数据调查的结果便是流失者总会经历“较长的”游戏时间,这种间歇性的活动时间有可能持续数周,甚至长达好几个月(就像第二张日历上所示那样)。他们可能老早就不再积极玩游戏了,但却仍会不时地登录到游戏中。 实际上他们可能已经退出了游戏,但偶尔也会为了竞拍,与好友聊天或者核查帐号是否转交给其它同伴而登录游戏。

接下来便是使用经验阀值去砍断这段“较长的”时间,从而明确玩家的游戏积极性是从何时开始淡灭的。最有效的查询方法便是“当日历上前30天的所有游戏时间少于9天时便可以算是最后的游戏时间。”但是这种方法的精确 度也仍然低于80%,并且这种法则只适合临时玩家而非忠实玩家。

image03(from gamasutra.com)

image03(from gamasutra.com)

 

重新定义流失时刻

决定该项目的成功元素在于重新架构流失时刻,从“玩家离开游戏”到“玩家的积极性低于流失阀值”。我们已经保存并广泛使用了频率参数,并以此界定“日历后30天玩家登录游戏的天数。”总之,如此我们便能够明确玩家的游戏频率,是每天游戏,每隔一天游戏,每周末玩游戏还是一个月只玩几天游戏。我们将根据玩家的游戏频率而划分出不同玩家:

接下来便是定义玩家在变成“The Pit”时的流失情况(这一领域的玩家具有极低的积极性以及极高的流失可能性)。从商业角度来看这种方法非常棒,因为比起研究流失者离开游戏的时间,我们现在更专注于早期检测以及预 测哪些玩家开始对游戏失去兴趣,从而让我们拥有数周时间去激励这些玩家继续游戏。

一种新方法便是预测哪些玩家将在2周内跌进“The Pit”区域内,是7–9,10–15,16–20队列中的玩家,或者是21–25队列的玩家将在3周内跌进这一区域。也就是我们将明确哪些玩家正在失去游戏动力,以及哪些玩家的积 极性将在今后几周时间里快速跌落:

image04(from gamasutra.com)

image04(from gamasutra.com)

 

选择参数

第一个项目的一大关键点在于一般活动参数对于预测流失情况非常重要。

虽然我们希望这种参数也能够用于分析资深玩家,但是最终我们还是决定尝试一些游戏特有的且具有社交性的参数:

聊天活动,跟同伴聊天以及发送聊天信息

锻造和收获资源

玩家对抗玩家与玩家对抗敌人模式

帐号中的最大角色级别

记下付费订阅天数

每日活动与游戏时间被当成主要的预测元素,而事实也证明它们就是。

选择适当的公式

当我们在分析新玩家时,因为只拥有少数几天的数据,所以我们必须使用简单的即时数值。而面对资深玩家我们却可以使用长达数周甚至数月的数据,所以对于数据收集我们也必须使用不同的方法。移动总数和移动平均值,以及导数和数据拦截在这种情况下都非常有帮助。

image05(from gamasutra.com)

image05(from gamasutra.com)

 

我们使用30天的移动总数去分析长期的每日活动,并根据线性方程式进行计算。实际用于数据挖掘模式中的参数只会是近似值的拦截值。在数据准备过程中使用普通最小二乘法便是运行T-SQL的一种非常简单的方法。

而对于每日游戏时间的分析,我们必须在使用近似值前删去零点游戏时间中的不活动天数:

image06(from gamasutra.com)

image06(from gamasutra.com)

我们必须重写ETL(游戏邦注:提取,转换,加载)程序并重新加载所有数据,而这么做也是值得的:因为一开始我们所获得的结果都较为粗糙,而面对16–20队列的不和谐的数据挖掘模式只能获得80%的精确度。

最后,基于不同的聚合时期和方法我们共获得30种参数,而关于预测哪些玩家将在2至3周内进入“The Pit”区域我们也达到了80%至90%的精确度。虽然这是一个棒的结果,但是在整个过程中我们却遭遇了种种瓶颈,不管使用 何种新参数或方法都没用。最终,当我们开始使用详尽的过去参数时才取得了突破。

详尽的过去体验

那时候我们模式中的时间轴如下:

image07(from gamasutra.com)

image07(from gamasutra.com)

 

零点是指我们开始预测未来的时候。基于队列数,我们将提前2,3周进行预测。数据挖掘模式是基于过去时期的各种参数进行估算,例如每天游戏时间的移动平均值的首个导数。所有参数都是从零点开始基于最后X天进行估算 —-最后3天,5天,1周前等等。

一种新方法便是去估算与过去某个点相关的参数。例如,我们能够估算来自7天中每日游戏时间的移动平均值的导数,但却需要追溯到零点前14天。你们是否还记得我之前所提到的玩家的长时间活动影响?本质来看这种方法应 该是将这些时间切割成不同部分,并将其作为各个独立的参数进行分析。在过去的查询过程中我们便尝试了这种参数组合,如(7,-21) 即在-7天至过去21天的时段,还有(7,-14), (7,-7)以及(14,-14)。

可以说这是我们的制胜法宝,能够帮助我们将有关所有队列的精确度推至95%:

image08(from gamasutra.com)

image08(from gamasutra.com)

 

黑盒子

实际上具有最高精确度的最终数据挖掘模式是基于两种参数的导数和估算,这两种参数分别是活动天数和每日游戏时间!不同队列需要参考不同导数。例如在21-25队列中,详尽的过去估算值非常重要;而对于7-9队列,30天 平均值以及零点前的3至5天参数则更加重要。不管怎样,用于估算资深玩家流失情况的公式总是比估算新玩家复杂得多。下图便是最终数据挖掘模式的示例:

image09(from gamasutra.com)

image09(from gamasutra.com)

 

如果你说这是一个带有各种神秘数字的黑盒子,那么你的想法便是对的。当我们重新回到对于新玩家流失情况的预测,我们会发现即使我们的模式获得了较高的精确度,但是我们却仍不清楚造成用户流失的具体原因。这种情 况同样也发生在资深玩家身上,即我们还未能把握用户流失的真正原因。我们只是获得了一个拥有95%精确度的黑盒子而已!

结果

如今我们能在资深玩家离开游戏前2,3周预测到他们积极性开始下降的情况,从而让我们的社区管理者能够尽早面向这些玩家而解决这一问题,或提供一些奖励以提高他们对于游戏的忠实度。

比起新手对于用户流失情况的预测,这种数据挖掘模式更侧重于使用公式和黑盒子方法,并要求我们必须投入更多时间去调整并修改结果,从而才能获得95%的精确度以及玩家的回访率。事实上在最终数据挖掘模式中我们未参 杂任何游戏玩法参数。我们只是基于源自活动天数和每日游戏时间的参数进行预测,而这种预测更是适用于所有游戏,乃至各种网络服务。

(本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦)

Predicting Churn: When Do Veterans Quit?

by Dmitry Nozhnin

In my previous article, I showed the process we developed for predicting churn of our freshest users, who just registered for the game, based on data collected during the first couple of days of their adventures. However, on the other end of spectrum are seasoned gamers who have spent months and months in the game, but for various reasons decided to abandon it. Predicting their desire to leave the game is possible, and in this article, we’re sharing our data mining methodology.

Tech Side

Nothing changed from the first data-mining project; we were still on two Dual Xeon E5630 blades with 32GB RAM, 10TB cold and 3TB hot storage RAID10 SAS units. Both blades were running MS SQL 2008R2 — one as a data warehouse, and the other for MS Analysis Services. Only the standard Microsoft BI software stack was used.

Our dataset had up to six months of recorded gameplay for about 38,000 veteran players.

Defining the Moment of Churn

For new players, defining churn was dead simple — they just leave the game after a couple of minutes or hours. That’s it. The last day of play was clearly defined, and data mining models on such churn factors were already well established. However, for veterans, it took us several iterations to define churn correctly. Our first assumption was this: the player is enjoying the game for some time, but then he decides to quit and leaves. Marking his play days with green, we expected something like this:

Our guess was that defining the churn point would be straightforward — the last game day. The reality, however, was more complex; the majority of players behave like this:

Is August 25th, when we’ve seen the player for the last time, the churn point? Or in fact August 16th, the day we hadn’t seen the player for seven consecutive days? Or July 31st, the first time she hadn’t launched the game for more than seven days? We tried several hypotheses, and the simple ones didn’t work out. Defining the churn in a simple way — predicting that a particular play day will be the last one — resulted in unimpressive 65 percent precision.

Manual data investigation revealed that majority of churners have a “long tail” of play days — those occasional activity days during several weeks, or even months, as shown on the second calendar example. They effectively stopped actively playing the game, but still log in from time to time. In fact, they had already quit; occasional logins are for auction sales, random chats, or probably indicate that the account has been passed on to guildmates.

The next step was to cut off this tail using some empirical thresholds in order to trace back to the day when the player’s activity decline started. The most effective query was something like “the last day of play when total game days for the previous 30 calendar days were fewer than 9″. Still, the precision was under 80 percent, and empirical rules didn’t work for loyal but very casual players.

Redefining the Moment of Churn

Key success factor of this project was reframing the moment of churn from “the player has left the game” to “the player’s activity has dropped below the churn threshold”. We already store and widely use the Frequency metric, defined as “days with game logins in last 30 calendar days”. In short, it means how often player has been playing — every day, every other day, on weekends, or just a few days a month. We segment players according to their play frequency:

The next step is redefining churn as they fall into The Pit, an area of extreme inactivity with very high probability to churn. This idea really makes sense from business point of view — instead of detecting churners the day the leave the game forever, we’re now focusing on early detection and prediction of disinterested players, and have several weeks to incentivize them to keep playing.

The new approach was to predict players who will fall down into The Pit in two weeks for 7-9, 10-15, and 16-20 cohorts, and in three weeks for the 21-25 cohort. So we’re looking for players who are losing momentum, and whose activity will drop significantly over the next several weeks:

Choosing the Metrics

One of the key insights from the first project was how important general activity metrics are for predicting churn.

We expected them to also be important for analyzing veteran players, but nonetheless decided to try some game-specific and social metrics as well:

Chat activity — whispers, guild chat, and common chat messages

Crafting and resource harvesting

PvP and PvE instances visited

Max character level on account

Remaining paid subscription days

Daily activity and playtime were expected to be key predicting factors, and so they are.

Choosing the Right Math

When we analyzed new players, we had only a couple of days of data, thus using simple instant values. However for veterans we think in terms of weeks and months, so different approach is required for data aggregation over time. Moving totals and moving averages, derivatives and intercepts are useful in this case.

We used a moving 30-days sum for analyzing long-term daily activity, approximated by a linear equation. The actual metrics that went into data-mining model were slope and intercept of the approximation line. Their calculation with ordinary least squares method is fairly easy with T-SQL during the data preparation process.

For daily playtimes’ analysis, days of inactivity with zero playtimes were stripped before applying approximation:

ETL procedures had to be rewritten from scratch and all data reloaded, but idea was well worth it: even first results on raw, untuned data mining models for the 16-20 cohort were around 80 percent precision.

In the end, with about 30 metrics with different aggregation periods and methods, we achieved 80 to 90 percent precision in predicting players who are about to fall into The Pit in two to three weeks. This is quite an impressive result, but for a couple of months we were stuck with it, no matter what new metrics and methods we tried. Finally, a breakthrough happened with the introduction of detailed past metrics.

Detailed Past Experience

The timeline of our model, by that time, was like this:

The zero point is the day for which we’re predicting the future. The forecast is being made for two to three weeks ahead, depending on cohort, as described earlier. Data-mining models are fed with various metrics for past periods, like first derivative of moving average of daily playtime per game day, calculated for past X days. All metrics were calculated based on last X days from point zero — last three days, five, a week ago, and so on.

A fresh idea was to calculate some metrics relative to points in the past. For example, we could calculate first derivative of moving average of daily playtime per game day for 7 days, but looking back 14 days before point zero. Remember what I said about long tail effect of players’ activity? Essentially the idea is to dissect the tail into separate parts and analyze them as independent metrics. We have tried some combinations of such detailed past queries, like (7,-21) – 7 days period 21 days into the past, (7,-14), (7,-7) and (14,-14).

This idea was our epic win, boosting precision and recall after some manual tuning to 95 percent for almost all cohorts:

Black Box

Most fascinating is the fact that final data mining models with best precision were entirely based on derivatives and calculations of only two metrics — days of activity and daily playtime! For different segments, different derivatives were important. In case of models for 21-25, all our detailed past calculations were important. But for the 7-9 cohort, models were based on 30-day averages as well as near-past metrics for 3 and 5 days before point zero. At any rate, the math is much more complex than it was for new players’ churn predictions. The following is the example of final data mining models (click to see full picture):

And if it looks like a black box with some mystical math inside — well, you’re right. Back when we learned how to predict new players’ churn, an alarming fact was that despite the great precision of the model we arrived at, we knew little about the actual reasons for churn. It’s the same for veterans — we have no human-comprehensible results about the nature of churn. Just an awesome, 95percent accurate black box.

Results

We’re now able to predict dramatic drops of veteran players’ activity two to three weeks ahead of their exit from the game, allowing our community managers to take care of those players, resolve their issues, or offer some incentives to boost their loyalty.

This data-mining project was heavier on math and the black box approach than the one for newbie churn prediction, requiring more time for fine-tuning and verifying the results, but leading to 95 percent precision and recall rates. Fascinating is the fact that no gameplay metrics made their way into the final data mining models. Prediction was purely based on metrics derived from days of activity and daily playtimes, which are generic for all games, and probably even for web services.(source:GAMASUTRA)

 


上一篇:

下一篇: