This repository contains the source code for the paper "Learning the Value Systems of Societies with Multi-objective Preference-based Inverse Reinforcement Learning" to be submitted to AAMAS 2026. Our algorithm, SVSL-P, observes a certain MOMDP envirnment, a given set of value labels and demonstrations of preferences of a (here, simulated) society of diverse agents with different value systems (multiobjective preference weights). Then, it simultaneously learns a reward vector for the MOMDP that implements a value alignment specification for the given set of values, a set of upto L preference weights that describe the different preferences observed, a clustering of agents into these value systems, and a weight-dependent policy (\Pi(s,a|W)) that implements any given set of weights (in particualr those selected as the clusters of the society).
The repository includes the other algorithms used in the paper in the evaluation, namely Envelope Q-learning from MORL-baselines MORL baselines, a modification of our previous algorithm Value Learning From Preferences, and a custom implementation of Preference-based Multi-Objective Reinforcement Learning.
- Select an empty main folder, go inside.
- Create a virtual environment with Python 3.13+. We used 3.13.5 in the paper.
python3.13 -m venv .venvsource .venv/bin/activate
-
Clone the following repositories inside the main folder.
- MORL baselines fork.
- Mushroom RL fork. Then make sure to select the branch "andres-dev".
cd mushroom-rl-kzgit checkout andres-devcd ..
-
Clone this repository in the main folder.
git clone https://github.com/andresh26-uam/ValueLearningInMOMDP.gitcd ValueLearningInMOMDP
-
Install packages
pip install ../mushroom-rl-kzpip install ../morl-baselines-reward
-
Requirements.
pip install -r full_requirements.txt
- Perform the steps 1-3 from "If not chancing the full code".
- Clone the following repositories in folder F.
cd ..(if needed to get to F)- Clone: Baraacuda. Then, make sure to select the branch "andres-dev":
cd baraacudagit checkout andres-devcd ..
- Clone: Imitation fork
- Requirements.
- Remove or comment lines 89 and 95 in
full_requirements.txt. pip install -r full_requirements.txt
- Remove or comment lines 89 and 95 in
### Generate preference datasets (and execute the Envelope Q learning baseline)
- FF environment
sh script.sh -ffmo -genrt -algo pc -L 10 -expol envelope -pol envelope - MVC environment
sh script.sh -mvc -genrt -algo pc -L 10 -expol envelope -pol envelope### Training The code is not memory efficient, you need at least 16GB RAM (preferably 32GB) and run only one of these simultaneously. - SVSL-P, FF:
sh script.sh -ffmo -train -algo cpbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34 - PbMORL, FF:
sh script.sh -ffmo -train -algo pbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34 - PC, FF:
sh script.sh -ffmo -train -algo pbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34 - SVSL-P, MVC:
sh script.sh -mvc -train -algo cpbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35 - PbMORL, MVC:
sh script.sh -mvc -train -algo pbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35 - PC, MVC:
sh script.sh -mvc -train -algo pbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35