The Playtesting Reformation


You may already know about Level 99 Games Open Playtesting Program, and perhaps you’re even a member already! If not, you can go to the Playtesting Portal and sign up right away!

In the past, we’ve had a tough time with our game testing. In this article, I’m going to talk about the difficulties that we’ve had, what changes we’re making to address them, and why we chose the solutions that we did. 

These changes are ultimately intended to help us get our games out on time and with higher quality. With more focused testing, we’re going to be able to deliver better games to you, and to deliver a better experience to testers as well.

1. It Starts in Design


One of the biggest problems that we’ve seen in testing is a lack of guidance from our Design teams.

In recent sets of Exceed, we didn’t convey the full intent behind each fighter and the reason we chose specific mechanics for specific characters. This led to a sense of disconnect between balance and mechanics. If balance is the ultimate goal, with no other fixed points, then it’s OK to drastically change the fundamentals of fighters in order to achieve that goal.

This is the worst, because it can lead us to testing which actually brings the game further from completion than when testing began, since now the design teams have to re-assess the fun-factor of the new mechanics.

The guilt here is really on the design team though. If we had gone into development with clear goals and fixed points and reasons for our decisions, these things wouldn’t have been a problem. Instead, many of the important design decisions were essentially punted to development. Saying “balance will figure out the details” was the worst mistake we’ve ever made in design. It’s the job of design to figure out ‘what’ and ‘why’. Development is only responsible for ‘how’ to make that vision work.

So what are we going to do? We have a two-step approach to this problem. Both parts occur during the design phase for our games.

Firstly, the most important rule is design with intent. Our design teams aren’t just out to make something, they also need to understand why they’re making it and what it’s going to accomplish. There are many design decisions that are made to achieve specific goals. The design team must record these decisions and the goals that led to them, and pass these notes to our testers in a Playtester’s Guide. This guide will detail what the goals for testing are, and in the case of a competitive game, the target gameplay and feeling for each faction or side that’s going into balance.

The second part is to design with understanding. Because the skills of game design and game mastery are very different, design teams can easily become disconnected with the competitive reality of their decisions. Just because an archetype seems great conceptually doesn’t mean that it’s healthy for the game. It takes developers and balancers to reveal this. In order to bring this understanding to the design team, we will be working with our playtesters and developers, to bring them onto the design team early and have their voices heard in the creation of gameplay concepts.

2. What are We Testing For, Anyway?


Metrics are one of the most important and frustrating parts of game development. How do you really know when something is “done”? What do terms like “done” and “balanced” even mean? Is it balanced if a matchup is 4-6? What if it’s 45-55? What about 495-505? As you can see, there’s really no end to it. Even these numbers are suspect, if the skill level of the players is differential. Not all playtesters are going to be at the pro-player level. Nor should they be.

What we’re recording during playtest sessions, and by extension, what we are looking for during these sessions, needs to be far more clear. “Play some games and let us know when you think it’s good enough” just isn’t going to cut it when you’re trying to deliver multiple, highly-polished, competitively-balanced games per year. We have to have a standard for what “good enough” is, so that we can ship games to you with consistent quality and on a consistent schedule.

We’ve identified four metrics for playtesting that we look at to determine if a character is “complete”. By following these four points, we will have a complete definition of Completion in playtesting. The exact requirements and exact numbers change for each game. With Exceed, we require more plays for balance. With Millennium Blades, things are fairly balanced across the board.

Fun - "Content is played and enjoyed.” 

Completion: X players have played and enjoyed the content.

Fun is a difficult thing to test, and means a lot of things to a lot of people. The important thing here is that we have a bottom limit of players who will stamp “Fun” on the content. We don’t need to please everyone with every piece of content, but if a certain number (not a certain percentage) of testers enjoy the content, we can be confident it will find an audience in the real world. The reason to avoid a percentage here is that playtesters are largely a self-selected pool, and tastes may become homogenous in the group. In order to give each voice equal weight and better reflect the demographics of the real world, we use a fixed number rather than a percentage of our playtest pool.

Balance - "Content is mathematically & heuristically balanced." 

Completion: X experienced players have played the current version of the content and found no need for further updates.

For Balance, we specifically look for the word of experienced players. As with fun, player skill levels and experience levels will vary, and balance will need to be determined carefully. Where possible, mathematical and mechanical balance comes first. After this, the heuristic (or “learned”) balance of actual players is applied. This “gut check” is intended as a final step to balancing the system, not as the primary driver of balance. Like fun, we look for a certain number of sign-offs from experienced players, rather than a percentage consensus. Competitive game balance is a divisive subject, after all.

Function - "Designer intent is expressed in the content." 

Completion: X new players have played content correctly & without confusion. 

The Function portion of our assessment is what we would more colloquially term as “Blind Testing”. It’s there to guarantee that Designer Intent (that is, the way the game is played) is clearly transferred to new players by the game’s rules and the text on the cards. In this pass, we seek to eliminate any hidden or obscure interactions with the game contents. If a card doesn’t reveal its true value or correct playstyle immediately, then it needs to be reworked in order to improve its functionality. 

We want players to be making strategic decisions, after all, not to be exploiting hidden interactions. This streamlining of function different than Depth. Depth of play which comes from the interaction of strategic decisions. It is our belief that making strategic play clearer and more accessible leads to an increase in total game depth, since more strategies can be accessed more easily by players.

Clarity - "Content is unambiguous and templated." 

Completion: X experienced players have proofread and approved the content.

To pass the muster for Clarity, the interactions of the component are clear and unambiguous. This covers wording, symbols, templating, and consistent use of language. A templating guide is provided to ensure that the game’s text is clear and clean. As a rule, text is minimized in favor of symbols, diagrams, and (where appropriate) keywords. Developers should suggest these updates where appropriate.

3. Let’s Get Together


One reason that playtesting has taken us so long in the past is the absence of regular playtesting events. We’ve got a great community on Discord, but for many months, our policy was merely to sign in and play whatever you felt like, whenever you felt like. Unless the community is in the thousands, it’s difficult to find someone in your own time zone, interested in your game, with a similar schedule to play.

Without a massive community to provide always-on matchmaking, it was often difficult to find an opponent to test your game with, particularly when there are 4 or 5 different games available in the testing pool, and a player is often looking for just one or another.

And even when you did find someone, there wasn’t always a staff member or authority on hand to answer questions about the game, or to help new testers with recording their results. This led to a huge amount of wasted time and potential in our playtesting community.

We began our efforts to remedy this early this year. Now, weekly events are announced to playtesters via email every Sunday night. Playtesters can sign on to those specific events and appear online to test with us. Furthermore, these tests are guided by a playtest lead, who is ready to answer questions and to take notes, and to ensure that players are properly credited for their participation to receive playtesting rewards.

4. Sprints, not Relays


While the events have been successful, they haven’t been perfect. We’ve been running our events with the idea of doing multiple games in parallel, getting in a full day of testing each week for each of three different games. Working parallel like this has actually slowed us down significantly though. Switching between tasks leads to lower quality testing. To explain, playing a game like BattleCON for 3 days straight it going to yield better testing on days 2 and 3, versus playing one day per week for 3 weeks.

In our updated testing schedule, we will be running sprints of game testing–a three day tournament event, or just continuous days of focused testing, then taking a break for a few days to do revisions, then pushing forward with the next testing event. Our goal in this is to get higher-quality results from intensive tests, then turn them around faster, to clear out a new competitive season of Exceed or a new batch of BattleCON Fighters in 6 weeks instead of 6 months.

5. Smaller Targets & Better Aim


Many of our competitive games have a lot of different factions, characters, or starting positions that need to be balanced. In these cases, it’s important to make sure that we’re testing each thing evenly and thoroughly. However, in a pool with 9 Millennium Blades characters or 20 Exceed Fighters, it can be hard to know what’s been tested and what remains to test. Even if the dev team knows (which they often haven’t), the playtesters certainly don’t know. And without directed playtesting, players tend to play the factions that interest them the most, leading to an uneven distribution of testing.

In upcoming projects, we will be breaking up the playtesting pool into phases. Testing a small batch of only 2 or 3 things exclusively until they’re complete. We’re also planning to test them against fixed targets. In the case of Exceed, this means battling everyone against Ryu and a handful of other fixed points. In Millennium Blades, this means looking at absolute point totals with different characters across fixed store card pools, in order to achieve consistency and make sure every store card gets seen.

By creating more narrow targets for our individual events, we can achieve greater consistency in testing the content of our games.

6. Better Rewards


The rewards for playtesting a game with us have always been good, but never really well-structured. We’ve gotten a bit better about delivering these rewards on time, and recording plays correctly, but there’s still a lot to be done to improve this.

In the coming months, we’re improving our organized play rewards with more cards, better presentation, and an eye towards supporting collectors who want to have something really unique in their games. Since these are the main reward we give out to playtesters, we’re building these improvements with the mindset that they will serve as special rewards for playtesters as well. 

We look forward to giving out more of these rewards in the coming months to the friends who help us make these great games!