My project demonstrated how deep Q-learning can be applied to games, in particular the classic arcade game Snake. I was successfully able to build a model that performed better than random chance. Although the performance was no where near human level, my Snake model achieved a maximum length of 10 segments and was adept at avoiding walls and itself. This project was a great learning experience for me, as it gave me the opportunity to study reinforcement learning algorithms.
One of the main difficulties I had during this project was incentivizing the agent to chase after the fruit, rather than just avoid the walls. To try to avoid this issue, I added an inverse distance term to the reward function, which should incentivize the agent to move towards the fruit even if it far away. However, my results were mixed, as while the model did improve after changing the reward function, they still were not great.
One of the main conclusions I drew from the project was the importance of choosing a reward function in deep Q-learning. While the reward function should dictate the goal of the agent, it can help to add smaller rewards at intermediary stages to reduce the reward sparsity and make the learning smoother.
Another conclusion was on the relative importance of training for longer versus making adjustment to the architecture. I found that changes to my architecture did not drastically affect my model's results. This could be due, however, to the small number of games I had the agent learn on, as without more data increasing the model size is not a viable option. It is possible I could have achieved better results by increasing the number of games simulated by a one or two orders of magnitude, and increased the model depth accordingly.
Finally, the project demonstrated both the strengths and weaknesses of deep Q-learning. Deep Q-learning is a very general family of algorithms, able to applied to any environment that is (or can be approximated as) a Markov Decision Process. However, a strong reward function and viable model architecture are required for the learning algorithm to work. Choosing and testing these hyperparameters seems to be the most difficult task in deep Q-learning. Automated reward function generation would therefore be an interesting research direction in the future.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.