Q-Learning Snake AI
Overview
This project implements a reinforcement learning agent that masters the classic game of Snake using deep Q-learning. The AI learns to navigate the game environment, collect food, and avoid collisions through experience and reward-based learning.
Technical Details
Architecture
The implementation consists of several key components:
- Game Environment: A custom Snake game implementation that provides the state space and handles game mechanics
- Initially experimented with a simple state representation, where the state is where the snake head is, where the apple is, and a boolean flag for the safety of each cardinal direction.
- Switched to a spatial state representation, where the state is the board, where the snake body parts are, and where the apple is.
- Q-Learning Agent: Deep neural network that learns state-action values — CNN-based
- Experience Replay: Buffer to store and sample from past experiences
- Reward System: +1 on apple pick up, -1 on death
Implementation Highlights
- Python-based implementation
- Deep Q-Network (DQN) architecture
- MLP for basic state representation
- CNN for spatial state representation
- Experience replay for stable learning
- Epsilon-greedy exploration strategy
Results
In the simple state space, the agent successfully learned to play Snake, demonstrating:
- Efficient path-finding to food
- Collision avoidance
In the spatial state space, the agent was able to learn the same things, but it could not do them consistently and did not learn as efficiently. I would hypothesize that it is due to the large increase in the state space and not being directly given information about what is “safe” and “unsafe.” It could also be due to “death” states being rarer than “alive” states — each run only has one death, but many alive states. In the future, I might experiment with using a prioritized experience replay to see if that helps with the imbalance between the quantities of the two states.
Future Improvements
- Implement prioritized experience replay
- Experiment with other RL algorithms (policy-based, actor-critic, evolutionary) — might be able to handle the larger state space better if they focus more on the action space
- Optimize network architecture for better learning
- deeper
- wider
- maybe better loss functions/activations