In 2017 I concluded my Machine Learning course on Udacity with this capstone project. Using data from a real-time strategy game, I prototyped a machine learning pipeline and a model that can predict which users are more likely to make purchases and spend money in a free-to-play mobile game after some time of play.
Years later, this model is still surprisingly applicable. XGBoost (boosted trees) continue to be a powerful model architecture in AI for tabular data such as this one, although more modern solutions such as LightGBM, CatBoost, or FT-Transformers (a type of neural network) will probably perform and scale better as of the time of this writing (2023).
[pdf-embedder url="https://texpine.com/wp-content/uploads/2023/01/predicting_paying_users_f2p_game_Tiago_Tex_Pine_v1.pdf" title="Predicting Paying Users in a Free-to-Play Game"]
The mobile games industry is dominated by the free-to-play model, but less and less companies are succeeding in making their business work. Not only a huge amount of consolidation is happening, but the cost of acquisition of a single new user has skyrocketed. A Live Ops / free-to-play game can only survive by:
- Keeping Retention as high as possible, so the game and its community remains alive and the company doesn’t have to acquire more new users in the expensive ad market.
- Converting players as much as possible - Paying Users are not only a source of revenue, but they are also the most enthusiastic users and create network effects that help keep other uses engaged, helping to stabilize retention too.
The ability to scan every new user that downloads the app and predict which ones are more likely to become a Paying User can be a big benefit for keeping both retention and revenue in good levels.
The goal of our model is to predict potential Paying Users with 3 days or less of data. In other words, predict if any new user who downloads the game and start playing will eventually become a Paying User in the future only by looking at his first 3 days after installation.
An undisclosed game company has agreed to provide us access to their data on a Mobile Multiplayer Strategy Game, with similar RTS-like mechanics to notable mobile titles such as Clash of Clans, Castle Crash, Game of War, Rival Kingdoms and Siegefall.
All player information has been anonymized - no information that could identify users, such as demographics or personal emails, are available. Only data of gameplay and some few other extraneous information like device/OS is present.
Also, the identity of the company, the name of the game and very specific information on how each feature works in the game (like the resource operations) will remain undisclosed - but it won’t affect our capacity to train models for the purpose of this paper.