-
Notifications
You must be signed in to change notification settings - Fork 0
Add agents and maze environment for reinforcement learning #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces a new `agents.py` module containing three agent classes: `SimpleAgent`, `MemoryAgent`, and `RandomAgent`, designed for reinforcement learning tasks. The `MemoryAgent` enhances Q-learning with episodic memory using a `MemorySpace`. Additionally, a new `maze.py` module is added, providing a `MazeEnvironment` class that simulates a grid-based maze for agent navigation. The `main_demo.py` file is updated to utilize these new classes, facilitating experiments with memory-enabled and memory-disabled agents. Utility functions for converting NumPy types to Python types are also included in a new `util.py` module for JSON serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces foundational components for a reinforcement learning setup by adding a utility for JSON-friendly NumPy conversions and a grid-based maze environment.
- Add
convert_numpy_to_pythoninutil.pyto turn NumPy types into native Python types for serialization. - Implement
MazeEnvironmentinmaze.pywith reset, observation, and step logic for a grid maze.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| memory/utils/util.py | New helper to convert NumPy scalars and arrays to native Python. |
| maze.py | New MazeEnvironment class with configurable size, obstacles, and reward logic. |
Comments suppressed due to low confidence (1)
memory/utils/util.py:1
- This utility function introduces new behavior but lacks corresponding unit tests. Consider adding tests for various NumPy types (integer, floating, ndarray, dict, list, tuple) to prevent regressions.
# Helper function to convert NumPy types to native Python types for JSON serialization
|
|
||
| def convert_numpy_to_python(obj): | ||
| """Convert NumPy types to standard Python types for JSON serialization.""" | ||
| if isinstance(obj, np.integer): |
Copilot
AI
May 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converter currently handles integers and floats but omits NumPy boolean types (e.g., np.bool_). Consider adding elif isinstance(obj, np.bool_): return bool(obj) to ensure full coverage of common NumPy scalars.
| reward (float): Reward for the action. | ||
| done (bool): Whether the episode has ended. | ||
| """ | ||
| # Actions: 0=up, 1=right, 2=down, 3=left |
Copilot
AI
May 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The step method assumes action is always 0–3 and will raise an IndexError for invalid values. Add explicit input validation (e.g., if action not in range(4): raise ValueError) with a clear error message to improve API robustness.
| ) | ||
|
|
||
| # Check if valid move | ||
| if ( |
Copilot
AI
May 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Invalid moves (out of bounds or into obstacles) are silently ignored. If this is intended, clarify this behavior in the docstring (e.g., "Invalid actions result in no position change but still consume a step").
This commit introduces a new
agents.pymodule containing three agent classes:SimpleAgent,MemoryAgent, andRandomAgent, designed for reinforcement learning tasks. TheMemoryAgentenhances Q-learning with episodic memory using aMemorySpace. Additionally, a newmaze.pymodule is added, providing aMazeEnvironmentclass that simulates a grid-based maze for agent navigation. Themain_demo.pyfile is updated to utilize these new classes, facilitating experiments with memory-enabled and memory-disabled agents. Utility functions for converting NumPy types to Python types are also included in a newutil.pymodule for JSON serialization.