Testing Methods
Think Aloud Protocol
Goal: In 1982, Clayton Lewis introduced the think-aloud protocol to user experience professionals. This protocol has been modified for the gaming industry; it’s one of the most effective ways to produce 'data-rich' results for researchers and game designers.
Procedure: In think-aloud, a player verbally expresses thoughts, feelings, actions, and experiences during gameplay. The researcher conducts the sessions, probing the player when necessary to share experiences aloud. This is a lab study, thus participants are invited to a lab. Often the test is conducted till the results saturate, i.e. nothing else new is learned, or time or money run out.
Data Collection: Data collected within think aloud sessions are notes (handwritten, audio recorded or video recorded) on what users said and where user said it within the gameplay session.
Data Analysis: This data is then analyzed to develop recommendations for solving usability issues that participants faced during the interaction within the game.
Resources:
C.H. Lewis. (1982). Using the “Thinking Aloud” Method in Cognitive Interface Design,Tech. report RC 9265, IBM.
C. Wright and A. Monk. (1991). The Use of Think-Aloud Evaluation Methods in Design,ACM SIGCHI Bull., vol. 23, no. 1, pp. 55–57.
Retrospective Testing
Goal: like think aloud, this method is used to identify playability or usability issues with the game.
Procedure: Retrospective testing involves recruting participants to play a game in a lab, and then retrospectively discussing their experience. This is often done through video taping a play session then asking participants to describe their experience by viewing their game play and commenting on it. Researchers often prompt participants to explain their choices or behaviors. Often the test is conducted until the results saturate, i.e. nothing else new is learned, or when time or money runs out. This method requires more time than think aloud protocols as it requires participants to play the game and also comment on their video. Thus, it is time intensive. However, unlike think aloud, it does not impose that players talk while playing.
Data Collection: Data collected within retrospective sessions are notes (annotations in the video, handwritten, or audio recorded) on what users said and where user said it within the gameplay session.
Data Analysis: This data is then analyzed to develop recommendations of issues that participants faced during the interaction or their thought process as they encountered events in the game.
Resources:
Jacob Nielsen. (1993). Usability Engineering, Academic Press.
Heuristic Evaluation
Goal: Traditionally, heuristics are a quick-and-dirty way to identify product usability problems early in production.
Procedure: Using these heuristics an expert can test or inspect the software for usability issues.
What are example heuristics: Neilson developed a set of heuristics for productivity software. However, games are different as they need to be fun and not just usable. Thus, heuristics for games need to inspect pace, flow, immersion, engagement, game mechanics, soundtrack, sound effects, camera angle, narrative, and emotional connection.
Different types of heuristics have been developed to inspect game playability. Examples:
PLAY Heuristics: A validated set of foundational heuristics that can be applied to many game types and genres. These were based on Noah Falstein and Hal Barwood’s research for the 400 Project (www.finitearts.com/Pages/400page.html), which included theories of fun and narrative, and interviews with designers of AAA (big budget) games.
Game Approachability Principles: A set of principles developed to test tutorials and introductory levels of games. These are often the first place that players quit playing or develop a negative perception of a game owing to being stuck.
GameFlow: Similar to PLAY heuristics. GameFlow is a heuristics model that integrates heuristics on Concentration, Challenge, Player Skills, Control, Clear Goals, Feedback, Immersion, and Social Interaction.
Data collected: are notes on issues pertaining to each of the heuristics.
Data Analysis: This data is then analyzed to develop recommendations for playability or usability issues with the game.
Resources:
Federoff, M.A. (2002). Heuristics and Usability Guidelines for the Creation and Evaluation of Fun in Video Games. MS Thesis, Department of Telecommunications, Indiana University, Bloomington, Indiana, USA, 2002
Heather Desurvire, M. Caplan, and J.A. Toth. (2004). Using Heuristics to Evaluate the Playability of Games. CHI ’04 Extended Abstracts on Human Factors in Computing Systems, ACM, pp. 1509–1512
Heather Desurvire and C. Wiberg. (2009). Game Usability Heuristics (PLAY) for Evaluating and Designing Better Games: The Next Iteration. Proc. 3rd Int’l Conf. Online Communities and Social Computing (OCSC 09), Springer, pp. 557–566.
Heather Desurvire and C. Wiberg. (2008). Master of the Game: Assessing Approachability in Future Game Design. CHI ’08 Extended Abstracts on Human Factors in Computing Systems, ACM, pp. 3177–3182.
Jacob Nielsen. (1994). Heuristic Evaluation. Usability Inspection Methods, J. Nielsen and R.L. Mack, eds., John Wiley & Sons, pp. 25–62.
Koeffel, C., Hochleitner, W., Leitner, J., Haller, M., Geven, A. & Tscheligi, M. (2010). Using Heuristics to Evaluate the Overall User Experience of Video Games and Advanced Interaction Games. In R. Bernhaupt (ed.), Evaluating User Experience in Games.
Penelope Sweeter and Peta Wyeth. (2005). GameFlow: A model for evaluating player enjoyment in games. ACM Computers in Entertainment. Vol. 3, No. 3
Rigby, Scott, and Richard Ryan. “Rethinking Carrots: A New Method For Measuring What Players Find Most Rewarding and Motivating About Your Game”. Gamasutra, 16 Jan. 2007. Web. 09 Dec. 2014.
Playtesting
Goal: Playtesting is used to understand first-time users' experience, onboarding issues, usability, or playability issues, and can also be used for exploring the player experience, including pacing, difficulty, and balancing issues.
Procedure: Playtesting often involves recruiting several players (depending on the game and the needs, could range from 6 to 25). Participants are invited to a lab to play a game at separate stations. They play through all or a part of the game (for example, the first hour). Gameplay can continue over several days or several weekends to complete the game.
Data Collection: Researchers can use several instruments to gauge the goals identified above. These instruments can be surveys, interviews, or game logs. See the instruments section for more information about instruments. Data collected are measurements, such as survey or interview data, which can be quantitative or qualitative.
Data Analysis: This data is then analyzed qualitatively or quantitatively over time to develop design recommendations of issues often related to motivation or engagement with the game over time. Analysis methods vary and are discussed in detail in the analysis section of this website.
Resources:
Capra, Miranda Galadriel. "Usability problem description and the evaluator effect in usability testing." (2006).
Katherine Isbister and Noah Schaffer. (2008). Game Usability: Advancing the Player Experience. Morgan Kaufman.
Nielsen, Jakob, and Thomas K. Landauer. "A mathematical model of the finding of usability problems." Proceedings of the INTERACT'93 and CHI'93 conference on Human factors in computing systems. ACM, 1993.
Game Analytics
Goal: Game Analytics is used prominently to monitor retention, engagement, churn, and core Key Performance Indicators like ARPDAU (Average Revenue Per Daily Active User). As a method, it can also contribute to understanding frequency and patterns of play through analyses of objective behavioral data recorded in real-time for many players.
Procedure: Game analytics is the process by which game data (behavioral, system, or other log data) automatically recorded with time stamps is analyzed deriving actionable results for different stakeholders, including business, marketing, design, and software development. It is composed of a large group of theories, methods, processes, architectures, and technologies that are used to transform raw data into meaningful information. Game analytics is implemented into nearly every shipped title by default now as it has clear promise for game evaluation and assessment. While game analytics can be used to enable better assessment, the approach is limited to only a recount of actual behavior with little understanding of users’ perceptions, needs, feelings, or attitudes – all of which are essential to explain the reasons for the behaviors observed.
Data Collection: Data collected are quantitative accounts of primitive actions distributed over time.
Data Analysis: This data is then analyzed quantitatively over time to develop recommendations to address churn, engagement and retention. See Analysis methods for quantitative Analysis Methods.
Resources:
Magy Seif El-Nasr, Anders Drachen, and Alessandro Canossa. (2013). Game Analytics: Maximizing the Value of Player Data. Springer.
El-Nasr, M. S., Nguyen, T. H. D., Canossa, A., & Drachen, A. (2021). Game Data Science. Oxford University Press.
Physiology-Based Playtesting
Goal: To gain as much information about players’ emotions and mental states, as well as the usability of the game, as unobtrusively as possible (as it is meant to remove the need for players to self-report as they would in a “think aloud” test).
Procedure: Physiology-based tests are lab tests similar to “Think Aloud Protocol”, in that participants are invited to a specific location (such as a lab) and asked to play the game(s) to be tested. Unlike “Think Aloud” tests, however, in a physiology-based test, users are not asked to state their feelings and emotions aloud. Instead, measurements are taken of the participants’ physical states during gameplay and used to identify critical events during gameplay.
Data Collection: Different monitoring methods have different strengths and weaknesses. Measuring how tightly someone grips a controller, for example, can be a good way to track how excited someone is but does not provide information about whether a player is engaged positively, or simply frustrated. The physical size and shape of the equipment will also vary, as will their potential effects on gameplay and user experience.
Data Analysis: Data collected during playtesting is used to identify critical events during gameplay; either actual in-game events or experiences created by the users themselves. Post-play interviews can provide additional information about players' thoughts at those points in time, especially when combined with playtesting videos which allow users to see themselves playing, and remember their thoughts at that time more easily.
Resources:
Mirza-Babaei, Pejman, et al. "Understanding the contribution of biometrics to games user research." Proc. DiGRA. 2011.
Ravaja, Niklas, et al. "Phasic emotional reactions to video game events: A psychophysiological investigation." Media Psychology 8.4 (2006): 343-367.
Tzovaras, Dimitrios, et al. "Multimodal monitoring of the behavioral and physiological state of the user in interactive VR games."