All the data used for my project is simulated. The reinforcement learning agent’s network is trained by playing simulated games of Snake. At each time step, I push the tuple (s, a, s’, r, d) into a replay buffer, where s is the current state, a is the action proposed by the agent, s’ is the next state simulated by the environment, r is the reward, and d is a Boolean representing whether the game has end or not. When training, the replay buffer is sampled in batches of size B. These batches are used to train the model, as described in the Model section.
To the right is an example of what the game state for Snake looks like at the beginning of the game. I have colored the head of the snake blue, its body green, the fruit red, and the walls black. This grid of RGB values is fed into the neural network to propose potential actions. The initial position of the snake is randomized between games to broaden the data distribution of game states. When the snake head intersects the fruit, I immediately move the fruit to a new, unoccupied position on the board and extend the length of the snake.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.