I trained my reinforcement learning agent over the course of 2000 full games. I split these games into 100 episodes with 20 games each, and tracked the average loss, average final length, and maximum final length for each episode. The loss curve is shown on the left while the length curves are below. The initial jump in the loss occurs because I use an epsilon-greedy policy for the first third of the training games, which forces the agent to pick a random action with epsilon probability. This helps the model generalize to new situations by expanding the training set distribution.
The maximum final length tended to increase as the model learned how to play. Early versions of the model rarely reached above a length of 5, while later versions sometimes reached lengths of 9 or 10. While this is better than random guessing, it is a far cry from human performance, which can easily reach into double digits.
The average final length curve tells a similar story. The final model performs better than the initial model, but only reaches an average final length of about 4. This indicates the snake is only able to pick up a single fruit on average, much worse than a human player.
My code for this project is available on GitHub at: https://github.com/juco3900/CSCI5922-Project
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.