NS2 balance and feedback process
We believe one of the most important things for us to do is take feedback from players constantly. This feedback decides which features, fixes, and balance changes should go into the next version of the game we release.
This cycle of getting feedback and making a new release is the core "loop" that our company lives and dies by. The better we can reflect the spoken and unspoken needs of our players, the better the game gets, and our income improves. This is especially important in our new knowledge economy, one where traditional marketing isn't as important as organic word-of-mouth spread.
So this player input is incredibly important to us and something we spend significant time working with, exposing and filtering. This input must come from many sources, including unstructured public play, competitive pick up games, tournament play and playtesting. It’s important to get feedback from all these sources, or the game could evolve to suit only one class of player. Some examples:
- We don't want to only take feedback from public games, because those games are more focused on fun instead of winning at all costs.
- But we don't want to take feedback only from competitive players either, because organized play where everyone knows each other isn't representative either.
- Taking qualitative feedback from discussions, meetings, message boards and anecdote is useful to determine emotional hotspots for players, but doesn't accurately represent the size or frequency of a problem: the squeakiest wheel gets the grease, as they say, and there is an unspoken majority (~90%) which doesn't even participate.
- Raw game statistics can reveal a lifetimes, win-rates, game times without a commander, average game length and all sorts of powerful information. But it's no good at understanding the "feel" of a particular game, and cannot address game pacing, or other intuitive problems.
- Etc.
So it's clear that a broad strategy which takes into account feedback from many different sources must be employed to accurately get a grip on "the state of the game". It cannot be distilled down to a single win-loss ratio, or a thumbs-up or thumbs-down, it's much bigger then that.
Here is a diagram of the overall process we are currently using to direct development on Natural Selection 2.
In the upper left is the current, live, public build of the game. In the middle-right is the new version we release. The diamonds are input sources and those are connected to rectangles which represent actions the development team takes. Listed below are descriptions of these input sources, and what they are good at revealing:
FEATURES
Public play (game feel, frustrations, smurfing, social implications)
Playing public games (anonymously most often!) show how the game is working in the wild, and with a mix of experienced and new players.
Google Moderator voting (quantitative player desires)
We recently started using Google Moderator to gauge player desire more quantitatively. Players from all over the web can submit new features, fixes and improvements. They can be voted up or down and we select new work largely off this work. As of this writing, we've had 761 suggestions and 48,774 votes by 1,440 people.
We also use this to convey which suggestions are currently in progress and which have been addressed. As an aside, we also use this tool internally to direct our own work on our level editor: our level designers submit suggestions and vote on those that would help them, and we use this directly to decide what to work on next.
Public forums (compatibility, exploits, exploration of new ideas)
Our forums can be an emotional and difficult place to spend a lot of time, but they are invaluable too.
Developer ideas (business direction, holiday features, new weapons, map ideas)
These are our own (usually qualitative and instinctual) ideas of where we want to see the game go. This doesn't usually address balance, but is a much-higher level "what we want" source.
Core playtester via in-game and internal forums (early, qualitative feedback)
We play the current dev build on a daily basis with a group of around 30-40 playtesters, hand-picked from the community. This is the first time new features and changes will be evaluated. There are often dev team members playing these games (especially Brian and especially as we get closer to releasing a patch).
I also play and/or listen on Teamspeak. This build tends to be rough and the total volume of games is low so it’s focus is more on early and qualitative feedback instead of balance or qualitative feedback.
GAME BALANCE
Strike team discussion (game diversity, game longevity)
Discussion on private forums. Invite-only team comprised of top tier competitive players. We recently started this private forum and is kind of comparable to the NS1 Veteran program.
Watching competitive play (competitive win/loss %, game diversity)
Watching experienced players go to any length to win gives a unique view of the game and highlights strategies and builds that are more powerful than others. This is a good place to see which traits, build orders and research are being overused, underused, or ignored completely. It also gives a good intuitive feel for balance - seeing two historically matched teams both win 5 games as aliens starts to illuminate a problem to be fixed.
Internal game statistics (public win/loss %, unit/ability timings, game length, player lifetimes)
This is the only formal non-human feedback source. We collect tons of data on thousands of games and hundreds of thousands of kills. This data lets us look at win/loss ratios (per map and overall), comparing resource flow, win/loss depending on starting locations and just about anything else you can imagine. These stats don’t show a single game in detail though, they only show aggregate stats, so they aren’t a replacement for playing and watching public and competitive games.
Our system has recently been rewritten to give us much richer data queries instead of pretty graphs. Previously we had a problem where there was a question about (for instance), shotgun vs. skulk stats. We would then create a generic graph showing this in a generic way, then I would look at the chart once and it would prompt a totally new question...which means we now need a new graph. So instead of trying to figure out what we need ahead of time, we have new system which gives us the answer to whatever questions we need. We no longer have pretty graphs, but do have a system where we can write SQL queries to get the answer NOW. For example, to view total wins and average game time based on the winning team, we could write a query like:
select count(*), avg(length), winner from raw_endgame where version = 234 group by winner
With tight patch schedules and players being very sensitive to game problems, we need answers this answers immediately, so we can respond with changes ASAP.
Official balance mod (verifying and sanity-checking numerical balance changes)
After discussing and doing private tests on balance, we make the more significant or risky balance changes live in an official gameplay mod. Feedback is given directly in the comment stream, in-game or via e-mail. As our live audience size increases, it is increasingly important to get balance changes correct out of the gate. If we make a mistake, hotfixing can cost us a day or more of development and countless frustrated players and e-mails. Being able to test this live with anyone that's interested is a massive improvement and serves as "QA" for balance.
MAP FLOW AND BALANCE
Organized map playtesting (map balance, map layout)
Bi-weekly playtesting of new maps and changes to existing maps. This is a hand-picked group of playtesters, clan-players and trusted players that have demonstrated their deep knowledge of the game. Our level designers often participate in these tests so they can capture the copious qualitative feedback over Teamspeak. We also have our own internal forum where players can draw diagrams or present their thoughts in a more thorough discussion.
TAKEAWAY
It's my hope that this article gives you a better understanding of our complex, ever-changing and nuanced feedback process. One thing we've learned for sure is that no system is complete nor perfect. We are always striving to improve and refine our process, as well as make it increasingly efficient.