Skip to main content

Announcing Paparouna CI

Effective Continuous Integration for Fan Translation Projects

Submitted by kmeisthax on Fri, 11/16/2018 - 14:59 in ROM Hacking

So, one of the new things for Patch 125 that didn't make it into the patch notes at all is a new continuous integration workflow. I call it Paparouna CI. It is a Buildbot system for validating changes made to the Telefang translation patch. It is actually a huge help as it double-checks work to ensure no changes have subtly broken our build system. It even works on pull requests, which is really important as some of our developers can't set up the build system for themselves. I'm going to go over how much of a pain in the butt it was to set up and how you too can get your own fancy ROM hacking CI system.

I Complain Because I Care

Buildbot is not the most well-documented system known to man. It has a lot of cool functionality without a lot of documentation or flexibility backing it. For example, I wanted to make use of a feature called EC2 Latent Workers. This is a special worker configuration that launches an EC2 instance at spot pricing when a build needs to be done, and then terminates it afterwards. Using this, we can build on otherwise very expensive instances without spending lots of money, since we're only keeping the instances running for a few minutes.

To use any sort of EC2 functionality, you either need:

  • Amazon EC2 login credentials - not a good idea to hard-code these in your master instance
  • IAM roles - assigns a given EC2 instance a set of access credentials only readable from inside the instance

Obviously we want to do the latter. Unfortunately, setting up an IAM access role is extremely frustrating and involves lots of trial and error. Buildbot's documentation does not list what permissions that role needs access to, so you have to keep restarting Buildbot and watching it's logs for permission errors to know what else needs to be adjusted. Or just looking for all of the following fun things in your EC2 console:

  • Instances that start and do nothing, causing Buildbot to start more of them in a panic
  • Instances that don't get tagged correctly, cluttering up your EC2 console
  • Inability to make spot requests, causing your instances to not start at all
  • EBS volumes that never get deleted and rack up a surprising amount of charges
  • Instances that don't get their security groups, can't connect, and fail

If you are doing this by trial-and-error, this can be extremely frustrating and expensive. (Okay, I may have paid a max of 70 cents on the whole project, but it felt expensive to me!) In fact, one thing Amazon fails to note is that IAM role grants don't take effect until instance restart. No, not reloading Buildbot via SIGHUP, nor shutting down and restarting the systemd unit. You need to shut down and restart the whole damned server, which is really great if you have things other than buildbot on it, such as your personal blog.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:DescribeInstances",
                "ec2:DescribeAddresses",
                "ec2:TerminateInstances",
                "ec2:RequestSpotInstances",
                "ec2:ImportKeyPair",
                "ec2:DescribeTags",
                "ec2:CreateKeyPair",
                "ec2:CreateTags",
                "ec2:RunInstances",
                "ec2:DescribeSpotInstanceRequests",
                "ec2:ReportInstanceStatus",
                "ec2:StopInstances",
                "ec2:DescribeSecurityGroups",
                "ec2:GetConsoleOutput",
                "ec2:DescribeSpotPriceHistory",
                "ec2:DescribeImages",
                "ec2:CancelSpotInstanceRequests",
                "ec2:StartInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeKeyPairs",
                "ec2:DeleteKeyPair",
                "ec2:DescribeInstanceStatus"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole",
                "iam:DeleteServiceLinkedRole"
            ],
            "Resource": "arn:aws:iam::*:role/*"
        }
    ]
}

If you just so happen to have a configuration exactly like mine, you can save yourself some pain and copypaste my IAM policy. You'll still need to create a role for your master server, grant it this policy, and then reboot the master server.

Furthermore, Buildbot has a facility called Secrets, which lets you keep important passwords out of your Buildbot configuration. Unfortunately, half of the things I want to use don't actually make use of them. This is actually really shitty, because I want to keep my Buildbot config in version control and keep all the secrets locked to the instance. I came up with a workaround, which involves reading out the secrets from the filesystem, but that won't work with any custom secrets vaults.

Off the top of my head, the following functions don't actually read Buildbot Secrets:

  • MySQL database passwords
  • GitHub personal access tokens (for pull request polls)
  • GitHub OAuth consumer tokens & secrets

Whereas I am able to use it for:

  • Worker passwords (that already have to be burned into an AMI anyway on the other end, smh)
  • GitHub personal access tokens (for status push & comment services)

This is an entirely random assortment of things that did and did not work. It's not even consistent based on secret type - I can secretize GitHub tokens, but only for the status and comment functionality. I have a feeling I need to trawl through the GitHub issue queue sometime...

Finally, if you're serving Buildbot from behind nginx - and you definitely should be - then you will probably notice the fact that websockets don't work. Actually, not so much "notice" as much as "cry deeply because nginx won't proxy this stupid thing". You need to add a separate location directive for /ws that tells nginx to allow WebSockets upgrades on this path:

location /ws {
    proxy_pass http://127.0.0.1:8010;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_read_timeout 7d;
    proxy_send_timeout 7d;
}

Also, those proxy timeouts are important. Whenever the websockets connection times out, Buildbot reloads the page. So we set it to an unreasonable seven days of allowed inactivity. You may want more or less. Just know that if Buildbot is reloading every time you go back to an old tab, that means your read and send timeouts are set incorrectly on the server side.

Do You Want To Make A Buildbot

Assuming none of the above complaints have scared you off, continuous integration is actually a fairly worthwhile service to establish, for any software project. Fan projects and ROM hacks are a little bit more of a pain to do, however, since we usually can't rely on services like Jenkins CI that publish everything under the sun. We want to keep the base game files a secret and only distribute patch files useful to people who own the game. Therefore, a privately-hosted service like Buildbot is more useful, since we can be selective about what is public and what isn't.

Buildbot consists of two services: masters and workers. Masters are long-running Python web services that check for new changes from your upstream version control service and host a web UI; while workers exist to handle build requests. Workers can be remotely hosted or even started on-demand, which is what I wound up doing, but it's not necessary. The bare minimum of Buildbot configuration can be achieved by just following the quick start tutorial on their website, which despite the previous section is actually well written. So I'm only going to cover all the things I did differently from a standard config instead of rewriting the whole tutorial outright.

Put Your Configuration in Version Control

It's tempting to just install everything using the standard tools and leave it there, but version controlling your configuration files is a must. Create a directory on your machine to hold a Git repository. This path will match whatever path on your server you are going to install Buildbot. For example, I put my buildbot files in /srv/buildbot, with a directory for the Python environment (master-env), the master directory itself (paparouna), and secrets storage (secrets). That means my Git repo should have master-env and paparouna directories, with a .gitignore on secrets so we don't commit anything secret. You should take care not to commit everything under the sun, of course, since most of it is going to be Python packages we don't need to track. So I ultimately wound up only committing master-env/pyvenv.cfg, paparouna/buildbot.tac, and paparouna/master.cfg.

If you're following along at home, you may be building your Python environment locally and then committing it to send it to the server. That's probably not the best approach; it'd be better to not commit virtualenv packages. The environment on your server is not likely to match what you have locally, so any binary packages (e.g. CPython extensions) won't load correctly. The best approach is to store only configuration in Git, then pull that repo down to your server in order to update configuration.

Save Yourself An Activation Step

Speaking of Python; when creating my virtualenv, I made sure to create a symlink activate linking to /master-env/bin/activate. This makes it easier to work on it; you just go to the buildbot directory and source activate.

Trust me. You will be coming back to the virtualenv multiple times while you work on this.

Keep It To 24 Hours Or Less

As I hinted at before, we need a facility to recall a base ROM image and get it in the worker's build configuration. We don't want it stored in version control or otherwise public. So I wound up creating a directory to store them called paparouna/baseroms. (This, of course, should be gitignored.) In your master.cfg, you can copy the base ROMs (or other secret files) into the worker's build directory with a command such as:

factory.addStep(steps.FileDownload(mastersrc="baseroms/my_games_baserom_(J).gbc", workerdest="baserom.gbc"))

This will, of course, copy the base ROM file from the master's baseroms directory into the corresponding build directory. If your project stores baseroms elsewhere, adjust the workerdest parameter appropriately.

Serving Up Your Patches

What good is a ROM hacking CI if you can't get the patches out of it? Buildbot doesn't really have a facility to handle build artifacts, but it has all the pieces necessary to do so. First off, you need to create a directory on the master to store them. I usually have them in paparouna/artifacts. Second, you need to tell your builder to compress and upload those patches back to the master. I use commands like:

factory.addStep(steps.ShellSequence(name="Package build artifacts for distribution",
    workdir="build/build",
    commands=[
        util.ShellArg(command=['mkdir', 'build'], logfile="stdio mkdir"),
        util.ShellArg(command=['cp', 'my_translation.ips', 'my_translation.map', 'my_translation.sym', 'build'], logfile="stdio cp"),
        util.ShellArg(command=['tar', 'cvjf', util.Interpolate('%(prop:revision)s.tar.bz2'), 'build'], logfile="stdio tar")
    ]))
factory.addStep(steps.FileUpload(name="Retrieve build artifact from worker",
    workersrc=util.Interpolate('build/%(prop:revision)s.tar.bz2'),
    masterdest=util.Interpolate("artifacts/%(prop:revision)s.tar.bz2"),
    mode=0o644,
    url=util.Interpolate("artifacts/%(prop:revision)s.tar.bz2"),
    urlText="Download patch and debugging symbols"
))

The first step enters the build directory, creates another directory, copies just the files we care about into it, and then creates a tarball. If your master has zip installed, you can use that too, but bzip2 is a better compression algorithm and more readily available on Ubuntu. If your project isn't on GB and doesn't use rgbds, then adjust the filenames appropriately to grab whatever artifacts you care about. We use the Git commit ID for the name of the tarball because, in the next step, we store it in the artifacts directory with all of the other builds we've done so that it can be linked to.

There's one additional wrinkle. Buildbot is a persistent process type application. If you've been following the guide, you probably haven't put the master behind a real web server yet. You'll need to do that, because the only way to actually serve these artifacts is to add the following to your nginx configuration:

location /artifacts {
    root /srv/buildbot/paparouna;
    allow all;
}

Adjust the root path as necessary. nginx will serve files out of the root path plus the location, which is a little confusing, and I imagine you'd have to do some rewrite nonsense if you wanted the URL path to not match the filesystem path.

Keep Your Disassembly And Patch Separate

In our current configuration, we have two branches we care about: master and patch. However, they have two different sets of build instructions. We want to be able to pull IPS patches and symbol lists (see above), but that only applies to the patch branch. So we need to set up two separate Builders and configure the schedulers to run the correct one. We can do this by setting up a change filter on each scheduler that checks the branch of the change being considered.

However, there's one wrinkle here: pull requests. If we want to build people's pull requests with CI, the branch will be wrong and neither scheduler will fire. The PR changesource uses a special GitHub branch that always produces the result of merging the PR into the base branch. Instead, we need to configure the changesource to categorize changes that come from a PR with the branch they were applied to. This also means having multiple GitHubPullrequestPoller sources instead of just one, so that they filter by branch. This is roughly what the result looks like:

def trusted_pr_filter(apidata):
    return apidata['user']['id'] in [108736, 1041815, 23729870]

c['change_source'].append(changes.GitHubPullrequestPoller(
    name='GitHubPullrequestPoller:telefang/telefang:master',
    owner='telefang', repo='telefang', branches=['master'],
    magic_link=True, category='telefang-master',
    pullrequest_filter=trusted_pr_filter))
c['change_source'].append(changes.GitHubPullrequestPoller(
    name='GitHubPullrequestPoller:telefang/telefang:patch',
    owner='telefang', repo='telefang', branches=['patch'],
    magic_link=True, category='telefang-patch',
    pullrequest_filter=trusted_pr_filter))

def branch_pr_filter(project, branch):
    def filter(c):
        return c.project == project and c.branch == branch or c.category == project + "-" + branch
    
    return filter

c['schedulers'].append(schedulers.AnyBranchScheduler(
    name="telefang-master-scheduler",
    change_filter=util.ChangeFilter(branch_pr_filter("telefang", "master")),
    builderNames=["telefang-master"]))
c['schedulers'].append(schedulers.AnyBranchScheduler(
    name="telefang-patch-scheduler",
    change_filter=util.ChangeFilter(branch_pr_filter("telefang", "patch")),
    builderNames=["telefang-patch"]))

This is roughly what our current setup looks like, abbreviated for the sake of this blog post. Any pull requests that come in are categorized by what builder we want to build them. Note that we have to check both the project/branch combo and the category, since the standard GitPoller does not support categorizing changes in this fashion. Also of note; I've configured the changesources to only report PRs filed by a small handful of trusted users. This prevents arbitrary users filing PRs that cause us to continually mine cryptocurrency on a fairly expensive instance type. A better approach would be to check if the user is a member of a particular GitHub organization, but I did not have that ready and tested at time of writing.

Latent Workers Aren't Lazy, Just Disengaged

You can configure Buildbot to fire up and shut down workers automatically. These are known as "Latent Workers". There's support for EC2 instances, as well as OpenStack "private cloud", libvirt instances, and Docker containers. The most interesting option is EC2, if you're willing to trust your AWS bill to Buildbot. I configured it to fire up a spot request to launch a worker AMI I had prepared. After doing some extensive benchmarking, I determined the best bang-for-the-buck was a t3.2xlarge instance. It's 8 CPUs (cores? threads? Amazon pls help) are enough to chomp through a build job in about 30 seconds. We can also save a bunch of money by configuring the latent worker to make spot requests which are significantly cheaper than just firing up an instance.

I can't share the worker AMI I made just yet. There's a hardcoded password it uses to authenticate itself to the master. I would really love if I could either remove such sensitive material from the AMI or, better off, find a way to build AMIs from a spec so that it's easier to update. As it stands, I built the AMI by firing up an Ubuntu instance, customizing it a bunch, and then imaging that. You could do the same thing, of course. The process of doing so is roughly:

  • Launch a fresh Ubuntu 18.04LTS
  • Create a Python virtual environment and worker according to what the Buildbot guide says to do
  • Hardcode (urgh) the master password in here
    • On the master we can at least use Secrets
  • Build whatever tools you need, then install them somewhere world-visible like /usr/local/bin
  • Add a systemd unit file to launch the buildbot at boot time and shut down when the buildbot exits
  • Image your AMI and reference it in the EC2LatentWorker configuration

I like to use user data to configure the latent worker AMI. I configured the buildbot.tac file on the worker to try and get the current user data, parsed as JSON, to configure the worker with. On the master end, you can add a line to your EC2LatentWorker like such:

user_data=base64.b64encode(json.dumps({"name": "keshi", "maxretries": 5}).encode("utf8")).decode("utf8")

Setting maxretries acts as safety: if the worker can't connect to the master, it won't sit there forever racking up EC2 charges. The name parameter is so that we can use the same AMI for multiple workers. If we decide we want to allow our master to fire up multiple latent workers, we'll just duplicate the same entry and change the name around. Note that the name in the user data must match the name of the EC2LatentWorker or it won't be able to connect and things will be bad.

Keep It A Secret

Since I told you to use version control, that also means you need to keep your secrets outside of it. The easiest way to do so is with Buildbot's secrets managers. With them, you just replace anything like a password or the like with an invocation of util.Secret("my_secret_password"). If you used the file-based secrets managers, it'll look for a file of the same name and replace the Secret with the contents of that file. This is kind of important, because as I mentioned above, Buildbot doesn't really support this all that well. For things where Secrets don't work, you'll get a cryptic and unhelpful error message in your twistd.log and a dead master. You need to replace the above with something like open('../secrets/my_secret_password').read().strip() instead. Of course, this forecloses the use of any non-filesystem based secrets vault, which is really terrible.

Open Invitation To Translation Projects

I want to extend Paparouna CI to other translation projects. We already have a few eligible candidates such as Bugsite. I also want to make this more broadly available for other uses. If anyone has a fan translation project and wants continuous integration, I'm considering extending it to your project. However, I do have a few requirements:

  • Your translation project must be entirely non-commercial. Projects which solicit crowdfunding or donations of any monetary value will not be accepted.
  • Your project must use a version control system supported by Buildbot.
    • In practice, most version control systems are supported.
    • If you don't have version control, go get version control.
  • Project must have published source code.
    • Limited consideration will be made for projects in private repositories, provided we are given read/write access to those repositories and organizations.
    • If your project does not have source code, then you probably won't benefit from Paparouna CI, and you should wait for my upcoming article on the ModBase build system.
  • Your project must be buildable on Ubuntu 18.04 LTS. (Or any future LTS, should I upgrade.)
    • If your project requires additional build tools, we will consider modifications to our worker image to install any well-known third-party packages.
  • Your build system must be capable of producing patch files as I am not interested in distributing unlicensed ROM images.
  • Your project must not interact in any way with any intellectual property (copyright, patent, or trademark) owned by a company known to legally harass non-commercial fan projects.
    • In particular, that means no projects translating games owned by Nintendo, The Pokemon Company, or Square-Enix.

If you believe your project is within these limitations, please send a message to @ft_org on Twitter detailing what game is being translated into which languages and how to access it's source code and build it. I will notify you via direct message if your project has been deemed suitable for Paparouna CI.