Intro to User Analytics
by Anders Drachen
The science of game analytics has gained a tremendous amount of attention in recent years. Introducing analytics into the game development cycle was driven by a need for better knowledge about the players, which benefits many divisions of a game company, including business, design, etc. Game analytics is, therefore, becoming an increasingly important area of business intelligence for the industry. Quantitative data obtained via telemetry, market reports, QA systems, benchmark tests, and numerous other sources all feed into business intelligence management, informing decision-making.
Two of the most important questions when integrating analytics into the development process are what to track, and how to analyze the data. The process of choosing what to collect is called feature selection. Feature selection is a challenge, perhaps especially when it comes to user behavior. There is no single right answer or standard model we can apply to decide what behaviors to track; there are instead several strategies that vary in goals: e.g., improve the user experience or increase monetization. In this article, we will attempt to outline some of the fundamental concerns in user-oriented game analytics, with feature selection as an overall theme. First, we’ll walk through the types of trackable user data, and then introduce the feature selection process, where you select how and what to measure. Importantly, this article is not focused on F2P and online games — analytics is useful for all games.
Data for Analytics
The three main sources of data for game analytics are:
Performance data: These are related to the performance of the technical- and software-based infrastructure behind a game, notably relevant for online or persistent games. Common performance metrics include the frame rate at which a game executes on a client hardware platform, or in the case of a game server, its stability.
Process data: These are related to the actual process of developing games. Game development is to a smaller or greater degree a creative process, but still requires monitoring, e.g., via task-size estimation and the use of burndown charts.
User data: By far the most common source of data, these are derived from the users who play our games. We view users either as customers (sources of revenue) or players, who behave in a particular way when interacting with games. The first perspective is used when calculating metrics related to revenue — average revenue per user (ARPU), daily active users (DAU) — or when performing analyses related to revenue (churn analysis, customer support performance analysis, or microtransaction analysis).
The second perspective is used for investigating how people interact with the actual game system and the components of it and with other players, by focusing on in-game behavior (average playtime, damage dealt per session, and so forth). This is the type of data we will focus on here. These three categories do not cover general business data, e.g., company value, company revenue, etc. We do not consider such data the specific domain of game analytics, but rather as falling within the general domain of business analytics.
Figure 1: Hierarchical diagram of sources of data for game analytics emphasizing user metrics.
Developing Metrics From User Data
Many people have proposed different methods of classifying user data over the past few years. From a top-down perspective, a development-oriented classification system is useful, as it serves to funnel user metrics in the direction of three different classes of stakeholders — for example, as follows.
Customer metrics: Covers all aspects of the user as a customer — for example, cost of customer acquisition and retention. These types of metrics are notably interesting to professionals working with marketing and management of games and game development.
Community metrics: Covers the movements of the user community at all levels of resolution, such as forum activity. These types of metrics are useful to community managers.
Gameplay metrics: Any variable related to the actual behavior of the user as a player inside the game (object interaction, object trade, and navigation in the environment, for example).
Gameplay metrics are the most important for evaluating game design and user experience, but are furthest from the traditional perspective of the revenue chain in game development, and hence are generally underprioritized. These metrics are useful to professionals working with design, user research, quality assurance, or any other position where the actual behavior of the users is of interest.
Customer metrics: As a customer, users can download and install a game, purchase any number of virtual items from in-game or out-of-game stores and shops, spending real or virtual currency,over shorter or longer timespans. At the same time, customers interact with customer service, submitting bug reports, requests for help, complaints, and so on. Users can also interact with forums, official or not, or other social-interaction platforms, from which information about these users, their play behavior, and their satisfaction with the game can be mined and analyzed. We can also collect information on customers’ countries, IP addresses, and sometimes even age, gender, and email addresses. Combining this kind of demographic information with behavioral data can provide powerful insights into a game’s customer base.
Community metrics: Users interact with each other if they have the opportunity. This interaction can be related to gameplay (combat or collaboration through game mechanics) or social (in- game chat). Player-player interaction can occur in-game or out-of-game, or some combination thereof — for example, sending messages bragging about a new piece of equipment using a post-to-Facebook function. In-game, users can interact with each other via chat functions, out-of-game via live conversation (TeamSpeak or Skype), or via game forums.
These kinds of interactions between players form an important source of information, applicable in an array of contexts. For example, a social-network analysis of the user community in a F2P game can reveal players with strong social networks — who are the players likely to help retain a big number of other players in the game by creating a good social environment (think guild leaders in MMORPGs). Likewise, mining chat logs and forum posts can provide information about problems in a game’s design. For example, data-mining datasets derived from chat logs in an online game can reveal bugs or other problems. Monitoring and analyzing player-player interaction is important in all situations where there are multiple players, but especially in games that attempt to create and support a persistent player community, and which have adopted an online business model, which includes many social online games and F2P games. These examples are just the tip of a very deep iceberg, and the collection, analysis, and reporting on game metrics derived from player-player interaction is a topic that could easily take up several volumes.
Gameplay metrics: This subcategory of the user metrics is perhaps the most widely logged and utilized type of game telemetry currently in use. Gameplay metrics are measures of player behavior: navigation, item and ability use, jumping, trading, running, and whatever else players actually do inside the virtual environment of a game (whether 2D or 3D). Four types of information can be logged whenever a player does something or something happens to a player in a game: What is happening? Where is it happening? At what time is it happening? And: Who is involved?
Gameplay metrics are particularly useful for informing game design. They provide the opportunity to address key questions, including whether any game world areas are over- or underused, if players utilize game features as intended, and whether there are any barriers hindering player progression. These kind of game metrics can be recorded during all phases of game development,as well as following launch.
Players can generate thousands of behavioral measures over the course of a single game session — every time a player inputs something to the game system, it has to react and respond.
Accurate measures of player activity can include dozens of actions being measured per second. Consider, for example, players in a typical fantasy MMORPG like World of Warcraft: Measuring user behavior could involve logging the position of the player’s character, its current health, mana, stamina, the time of any buffs affecting it, the active action (running, swinging an axe), the mode (in combat, trading, traveling), the attitude of any NPC enemies toward the player, the player character name, race, level, equipment, currency, and so on — all these bits of information simply flow from the installed game client to the collection servers.
From a practical perspective, you may want to further subdivide gameplay metrics into the following three categories (in order to make your metrics more searchable, for instance):
In-game: Covers all in-game actions and behaviors of players, including navigation, economic behavior, as well as interaction with game assets such as objects and entities. This category will in most cases form the bulk of collected user telemetry.
Interface: Includes all interactions the player performs with the game interface and menus. This includes setting game variables, such as mouse sensitivity and monitor brightness.
System: System metrics cover the actions game engines and their subsystems (AI system, automated events, MOB/NPC actions, and so on) initiate to respond to player actions. For example, a MOB attacking a player character if it moves within aggro range, or progressing the player to the next level upon satisfaction of a predefined set of conditions.
To sum up, the array of potential measures from the users of a game (or game service) can be staggering, and generally we should aim for logging and analyzing the most essential information. This selection process imposes a bias, but is often necessary to avoid data overload and to ensure a functional workflow in analytics.
Bias is introduced in the dataset both by the selection of the features to be monitored and also by the measuring strategies adopted, and that happens to a large degree when analysts work in a vacuum. If those responsible for analytics cannot communicate with all relevant stakeholders, critical information will invariably end up missing and the full value of analytics will not be realized.
Analytics groups are placed differently across companies due to analytics arriving to the industry from different directions, notably user research, marketing, and monetization, and this can lead to a situation where the analytics team only services or prioritizes their parent department. Having a strong lateral integration — making sure that the analytics team communicates with all the teams, for example — helps to avoid this issue. This also helps alleviate the common problem that the analytics teams, without having sufficient access to design teams, are forced to self-select features to track and analyze, without having the proper grounding in the design of the game and its monetization model.
Even for a small developer with a part-time analyst this can be a problem. Another typical problem is that the decision about which behaviors to track is made without involving the analytics team. This can lead to a lot of extra time spent later on trying to work with data that are not exactly what is needed, or needing to record additional datasets. Good communication between teams also helps alleviate friction between analytics and design.
Importantly, analytics should be integrated from the onset of a production — all the way back in the early design phases. Early on it should be planned what kinds of behavior that should be tracked and with what types of frequencies. This allows for optimal planning of how to ensure value from analytics to design, monetization, marketing, etc. Analytics should never be slapped on sometime after the beta. In this way analytics is similar to other tools like user research, in that it ideally is embedded throughout the development processes, and after launch.
Knowing that there is an array of things we can measure about user behavior, how do we then select among them? And do we really have to make choices here? Sadly, yes. In real life, we rarely have the resources to track and analyze all possible user behaviors, which means we have to develop an approach to analytics that considers cost-benefit relationships between the resources required for tracking, storing, and analyzing user telemetry/metrics on one hand, and the value of the insights obtained on the other. It is also important to be aware that the analyses needed during different stages of production and post-launch varies. For example, during the latter phases of development, tuning design is vital, but many metrics related to monetization cannot be calculated because purchases have not been made by the target audience yet.
We will discuss this in more detail below, but in short, by following this line of reasoning, the minimum set of user attributes that should be tracked, stored, and analyzed should include considerations as to the following:
1) General attributes: The attributes that are shared for users (as customers and players) across all games. These form the core metrics that can always be collected, for any computer game– for example, the time at which a user starts or stops playing, a user ID, user IP, entry point, and so on. These form the core of any game analytics dataset.
2) Core mechanics/design attributes: The essential attributes related to the core of the gameplay and mechanics of the game. (For example, attributes related to time spent playing, virtual
currency spent, number of opponents killed, and so on.) Defining the core design attributes should be based directly on the key gameplay mechanics of the game, and should provide information that lets designers make inferences about the user experience (whether players are progressing as planned, if flow is sustained, death ratios, level completions, point scores).