This is course project for 11-785: Introduction to Deep Learning (Fall 2025) at Carnegie Mellon University. Please find the project report which documents the details of the project. The project is focused on training large language models for any-to-any generation tasks, including multimodal tasks involving images and speech.
-
external/
Directory for external dependencies or third-party scripts/tools. -
logs/
Contains logs generated during evaluations. -
MMMUResults/
Stores evaluation results forMMMUtasks. -
MMMUTokenized/
Contains pre-tokenized data forMMMUtasks. -
SpeechResults/
Stores evaluation results for speech tasks. -
SpeechTokenized/
Contains pre-tokenized data for speech tasks. -
SpeechTokenizer/
Repository or module for speech-specific tokenization logic. -
SpeechGenResults/
Stores generation-based evaluation results for speech tasks.
-
eval_mmmu.py
EvaluatesMMMUtasks in a constrained setting.
Supports token-based evaluation of instruction-response tasks. -
eval_mmmu_gen.py
Performs generation-based evaluation forMMMUtasks.
Focuses on free-form responses. -
eval_speech.py
Evaluates speech tasks with pre-tokenized audio data in a constrained manner.
Uses prompts tailored for speech-to-text evaluation. -
eval_speech_gen.py
Performs free-form generation-based evaluation for speech tasks.
Handles tasks dynamically with multiple datasets.
-
speech_tokenization.py
Tokenizes audio files for speech tasks.
Outputs tokenized representations for use in evaluations. -
image_tokenization.py
Tokenizes image data for image-based tasks.
Supports multimodal evaluations.
-
eval_mmmu.sh/eval_mmmu_gen.sh
Shell scripts to runMMMUevaluations. -
eval_speech.sh/eval_speech_gen.sh
Shell scripts to run speech evaluations. -
tokenize_image_audio.sh
Script for tokenizing both image and audio data.
-
inference.py
General inference script for running models on various tasks. -
anygpt_install.sh
Script to install dependencies and set up the environment.
-
speech_tasks.json
JSON file containing configurations for speech datasets. -
README.md
This file, providing an overview of the project.
- Python 3.8 or higher
- Required Python packages (install using the provided installation script):
bash anygpt_install.sh