Details

Quickstart

The main repo for the Gym Retro environments is at https://github.com/openai/retro. Follow the setup instructions there, then come back here.

Now you can create a simple random agent in Python, random-agent.py:

import retro


def main():
    env = retro.make(game='Airstriker-Genesis', state='Level1')
    obs = env.reset()
    while True:
        obs, rew, done, info = env.step(env.action_space.sample())
        env.render()
        if done:
            obs = env.reset()


if __name__ == '__main__':
    main()

When you run this, you should see a video game show up on your screen with a spaceship shooting randomly. Press Ctrl-C in the console to kill the program.

You are free to train your agent however you'd like, but we recommend using Sonic 1, 2, and 3 & Knuckles, which are available on Steam here:

Once you have them, you can import the ROMs of the games with the provided script:

python -m retro.import.sega_classics

You will need to enter your Steam username, password, and if applicable a Steam Guard code. If you don't use the Steam mobile app for Steam Guard you can get a Steam Guard code via email by logging into Steam in a new browser or private session, or you can change your Steam Guard settings. The import process may take several minutes to run.

If you already have the games installed, you can import them directly:

python -m retro.import <path to steam folder>

To install the repo for the contest, which includes environment wrappers to set things up in the official way, run this:

git clone --recursive https://github.com/openai/retro-contest.git
pip install -e "retro-contest/support[docker,rest]"

Using a game and state from sonic-train.csv, you can create a copy of the contest environment in Python. Here is a random agent script, random-agent-contest.py:

from retro_contest.local import make


def main():
    env = make(game='SonicTheHedgehog-Genesis', state='LabyrinthZone.Act1')
    obs = env.reset()
    while True:
        obs, rew, done, info = env.step(env.action_space.sample())
        env.render()
        if done:
            obs = env.reset()


if __name__ == '__main__':
    main()

Note that it imports from retro_contest.local instead of retro. When you run this script you should see Sonic jumping randomly around a level. At this point, you can submit a simple agent to the evaluation server.

Also, be sure to check out our Discord server to chat with us or other contestants.


Submit an Agent

To have your agent show up on the leaderboard, you can submit it to be evaluated on our servers. Here's how to submit an example agent and then customize it.

To easily allow submissions without requiring specific frameworks, the contest submissions are Docker containers. If you are on a Mac you can use Docker for Mac, Windows users can use Docker for Windows (requires Hyper-V support in Pro editions) or Docker Toolbox, and Ubuntu Linux users can follow the Docker installation guide.

Once you have Docker setup, you should register on the contest site and wait for an administrator to approve your account. After registering you can view your profile and see your private Docker registry URL, username and password. You can then login with the docker command line tool:

export DOCKER_REGISTRY=<docker registry url>
docker login $DOCKER_REGISTRY \
    --username <docker registry username> \
    --password <docker registry password>

Here is a simple agent script that performs random actions but always holds down the "move right" button, simple-agent.py:

import gym_remote.exceptions as gre
import gym_remote.client as grc


def main():
    print('connecting to remote environment')
    env = grc.RemoteEnv('tmp/sock')
    print('starting episode')
    env.reset()
    while True:
        action = env.action_space.sample()
        action[7] = 1
        ob, reward, done, _ = env.step(action)
        if done:
            print('episode complete')
            env.reset()


if __name__ == '__main__':
    try:
        main()
    except gre.GymRemoteError as e:
        print('exception', e)

And a Dockerfile to build the agent image, simple-agent.docker:

FROM openai/retro-agent
ADD simple-agent.py .
CMD ["python", "-u", "/root/compo/simple-agent.py"]

To download these files and build the agent, try this:

mkdir simple-agent
cd simple-agent
curl -O https://contest.openai.com/static/simple-agent.py
curl -O https://contest.openai.com/static/simple-agent.docker
docker build -f simple-agent.docker -t $DOCKER_REGISTRY/simple-agent:v1 .

Now you can upload the Docker image:

docker push $DOCKER_REGISTRY/simple-agent:v1

You should now go to the Submit Job page and use simple-agent:v1 as the image name to submit a new job. You can also check the status of your job and check the leaderboard once it completes.

Congratulations! You have made your first submission and it should show up on the leaderboard after a couple of hours. See the Issues section if anything seems to be broken.

The default agent image includes the newest version of TensorFlow. If you want to use TensorFlow 1.4, you can pull openai/retro-agent:tensorflow-1.4 or for PyTorch pull openai/retro-agent:pytorch and then use it as your base image.

Baselines

We've created a set of baseline implementations that you can tweak in the retro-baselines repo on GitHub. You can use these as starting points for making fancier algorithms or just for tweaking parameters on the existing ones.

Local Evaluations

Whenever you make some changes, you should test it locally before submitting so you can make sure your code runs correctly and avoid waiting for the evaluation server to start your job only to find out you have a syntax error.

To run local evaluations you should pull the openai/retro-env Docker image and tag it as remote-env:

docker pull openai/retro-env
docker tag openai/retro-env remote-env
retro-contest run --agent $DOCKER_REGISTRY/simple-agent:v1 \
    --results-dir results --no-nv Airstriker-Genesis Level1

You can then look in the results directory to what output was. agent refers to your code, while remote is the remote-env evaluation server that your agent talks to. For other options to retro-contest, use "--help".

Once this works you can run against a Sonic game. Assuming you have already followed the Environment section for getting ROMs, you can run this:

retro-contest run --agent $DOCKER_REGISTRY/simple-agent:v1 \
    --results-dir results --no-nv --use-host-data \
    SonicTheHedgehog-Genesis GreenHillZone.Act1

If you look at the output in the results directory, you can see how your agent is performing. results/monitor.csv is particularly useful since it will record the reward, episode length and current timestamp for your run. Your agent will be evaluated based on these rewards when it is uploaded to the server.

That's it for running local evaluations, make sure to use this before uploading so you can be pretty confident that your code works before waiting for a job to start.


Environments

The environments being used for the training and test set are based on SEGA Genesis games that are run with a gym interface provided by Gym Retro. The training set is composed of levels from 3 Genesis games (Sonic the Hedgehog™, Sonic the Hedgehog™ 2, and Sonic the Hedgehog™ 3 & Knuckles), while the test set is composed of custom levels for those games. You are allowed to train on other games or datasets outside of the provided training set.

Each timestep advances the game by 4 frames, and each observation is the pixels on the screen for the current frame, a shape [224, 320, 3] array of uint8 values. Each action is which buttons to hold down until the next frame (a shape [12] array of bool values, one for each button on the Genesis controller where invalid button combinations (Up+Down or accessing the start menu) are ignored.

The environment is stochastic in that it has sticky frameskip. While normal frameskip always repeats an action n times, sticky frameskip occasionally repeats an action n+1 times. When this happens, the following action is repeated one fewer times, since it is delayed by an extra frame. For the contest, sticky frameskip repeats an action an extra time with probability 0.25.

You are only given a single instance of the game and cannot instantiate multiple versions in parallel. In addition you are limited to 4,500 timesteps per episode, corresponding to 18,000 frames or 5 minutes of real time at 60 fps.

Each episode starts at the beginning of the level and ends when the agent dies, reaches the end of the level (defined as a specific horizontal offset), or when the timestep limit is hit.

During training you can access a few variables from the memory of the game through the info dictionary. During testing, these variables are not available.

The reward your agent receives is proportional to its progress to the predefined horizontal offset within each level, positive for getting closer, negative for getting further away. If you reach the offset, the sum of your rewards will be 9000. In addition there is a time bonus that starts at 1000 and decreases learning to 0 at the end of the time limit, so beating the level as quickly as possible is rewarded.

One recommended training-validation split is as follows: sonic-train.csv, sonic-validation.csv. However, you are welcome to train on the levels in sonic-validation.csv since the final evaluation will use test levels.

You can only have one environment per process due to limitations of the libretro API. If you want more than that, you can create subprocesses. If you call env.close() when you're done with an environment, you can create a new one in the same process. env.render() will also work to see what is going on on the screen.

Evaluation

For evaluation on the test set, your agent is run on a VM with 6 E5-2690v3 cores, 56GB of RAM, and a single K80 GPU. The environment is synchronous with only one instance, meaning that with 12 hours of time you should average ~43ms per timestep to get to 1 million timesteps within the limit. The environment runs at ~1000 frames per second for a single core with random agent, meaning 1ms per frame, or 4ms per timestep, leaving you 39ms for your processing.

Your agent is allowed to learn (adjust its weights, use a replay buffer, etc) during test time, although a separate copy of the agent will be run on each test level.


Timeline

Submissions are allowed from April 5 to June 5 (2 months).

Rules

Only one account per team, do not make extra accounts to circumvent the per-team evaluation limit. You can share short code snippets or tutorial code with other teams, but no full or partial solutions can be shared. Each person is only allowed to be on one team at a time.

Categories

There are two award categories, "Best Score" and "Best Writeup". To be eligible to win you must release your submission as open source at the end of the contest. 1st, 2nd, and 3rd place winners from each category will receive a trophy. In addition there will be a single award for "Best Supporting Materials".

All winners will be invited to co-author a tech report with OpenAI about the contest.

Best Score

When your agent is being tested against the test set, all episode lengths and rewards are recorded. To get your performance on each environment in the test set, calculate the average reward per episode. If you then average these together, you get your overall score.

For example, if for a test set of two environments the rewards for each episode looked like:

    Test Environment 1: 10, 20, 30
    Test Environment 2: 40, 50

The score for this submission would be:

level_1_avg = (10 + 20 + 30) / 3
level_2_avg = (40 + 50) / 2
score = (level_1_avg + level_2_avg) / 2 = 32.5

The test set environments will have similar reward functions to those in the training set, but will be custom levels built for the games. The custom levels will not be released until after the contest has ended.

After submissions are closed, each team's latest submission will be run against the full test set and the final results will be collected to find the winners of the "Best Score" category. We reserve the right to change the specifics of the evaluation, for instance, if bugs are found in the training or test environments.

Leaderboard

For the leaderboard, your submission will be run against a set of levels similar to some from the training set. While this does not indicate how well your submission will do in the final test, it is useful for checking that you are not overfitting to the training set. You are limited to having a single submission running at any given time; for each individual level (1 million timesteps) you are given a maximum of 12 hours of wall clock time.

Best Writeup

For any team with a submission that is run on the final test set, if the team creates a writeup of how their algorithm works, they are eligible for the "Best Writeup" award. OpenAI researchers will read the writeups and choose winners based on the quality of the writeup and the novelty of the algorithm being described.

Best Supporting Materials

This award will go to whoever makes the best tutorials, libraries, or other supporting materials for the contest as judged by OpenAI researchers.


Issues

Check the GitHub issues or email us.

Legal

NO PURCHASE NECESSARY TO ENTER OR WIN. ALL FEDERAL, STATE, LOCAL, AND MUNICIPAL REGULATIONS APPLY. VOID WHERE PROHIBITED. Winners will be notified via email and will be listed in a subsequent blog post. This contest is sponsored exclusively by OpenAI, Inc.