Learning to Read Replays

The first step of creating CRustAIcean was to simply build up the provided C bindings into a usable Rust crate that could be easily used. With this completed I’ve now turned my attention to developing the inner workings of CRustAIcean. As I want this to be a machine learning AI I’m going to need some training data to work with. Thankfully there are a lot replays readily available on BAR’s website. This provides a great source of training data! The only problem is the replays are built in a unique format that is not text-based, so I’ve had to dig through a bunch of code to figure out how these replays are formatted and how to read them. Specifically I wanted to be able to gather statistics throughout the game (like resources and unit counts) and commands that each player gave. Using these I can make a very basic version that can take the current game statistics and give a relevant command for the situation.

NOTE: Below is my journey of discovering BAR/Spring’s replay format. The information is not completely accurate. I’m sure there are mistakes, misinterpretations, or other bugs. This is just my best understanding over the last couple weeks of research.

My journey began by finding the Spring Stats Viewer, a project from 2018 that has some basic ability to read statistics out of the replay files. I attempted to compile this project, but it was so far out of date that it wasn’t worth it to try and get it working. Instead I opened up the SpringDemoFile.py file to begin trying to understand how it generated statistics from the replay. Very quickly I realized that it was going to be a complex process as the 1600 lines did not have the clearest layout. Plus, with how old the project was I figured my best bet would be to search elsewhere.

From here I went over to the Spring Replay Site which has a parse_demo_file.py file that held a much clearer layout and showed how to generally read the replay files. I generated a short 1-minute replay between a couple of SimpleAI’s and built a replay parser that could read those replays. Here is a visual representation of what I discovered the replay files contain:

Replay Format

You can see that the file begins with a header containing basic information about the file itself, identifying it as a “demofile”, Spring’s internal name for replays, giving file version, time information, as well as sizes for all the sections of the file. Following this header section is the script section, which contains a copy of _script.txt that specifies to the engine all the game, mod, and map settings for this replay.

After this is the demostream, the meat and potatoes of the replay file that contains all of the actual game information that BAR/Spring uses to run the replay for you. (I’ll get into the details here later) The winningallyteam section, containing the team that won the battle. The player_statistics section contains information about how the human players interacted with the game (like mouse clicks and keyboard presses). Finally, team_statistics keeps track of all the important game statistics like resources and units at each frame. This final section is exactly what I wanted to find to be able to train off of the replays!

So let’s dive a little deeper into demostream. Each demostream entry contains the time, length of the line, the data for this row, and the command type of this line. The command type determines how you should read the data. Below I’ve included the data formats of each type of command that I was able to find in Spring Stats Viewer, Spring Replay Site , and the BAR Engine Source.

Replay Command Format

Using this data we can identify players (with player number & name), keep track of our game progress (with the frame commands), and see what commands were given from AIs and players. A lot of the information within the replays is not important to this project, but the AI and player commands are very important to us so. Of course, this is another layer of formats we have to unpack (because it couldn’t be simple!). These commands are packed into a special format to be sent over the network, and this same format is used in the replay files. So one last journey through the BAR Engine Source to understand this version of the command system, and we get these translations of the commands and parameters:

Replay Game Command Format

Replay Game Command Parameters Format

With a decent amount of elbow grease I was able to parse through the replay successfully using the above files. It was nothing pretty, I assure you, but even with a pretty rough read through of the replay file I was able to read 90% of the data successfully, and most importantly had all the statistics and commands (which was my focus to begin with). With this I graduated to reading a replay from BAR’s site…

And discovered that those replays don’t include game statistics.

None of them do. (or at least none that I could find)

Which means I have no input data to train my AI off of.


We are officially two steps forward, one step back. I can read commands and statistics from replays, but only commands are uploaded to the site, not the statistics. So now I have to figure out how to get statistics for those games (foreshadowing for the next article?).