(ACL 2025 Findings) Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery

🔍 About | 🚀 Quick Start | 📊 Evaluation | 🔗 Citation

🔍About

This is the official repository for ACL 2025 (Findings) paper "Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery". This paper proposes MATMCD (Multi-Agent system with Tool-augmented LLMs for Multi-modal enhancement of Causal Discovery), a novel framework designed to improve causal discovery by integrating multi-modal data using tool-augmented large language model (LLM) agents.

🔧Framework

Traditional causal discovery methods rely solely on statistical correlations in observational data, overlooking valuable semantic cues from external sources. MATMCD addresses this gap by introducing a multi-agent system. MATMCD supports modular integration with statistical causal discovery (SCD) algorithms (e.g., PC, ES, DirectLiNGAM), and enables enhanced reasoning by combining symbolic causal graphs and unstructured textual data.

MATMCD has a architecture as illustrated in Figure 1, which consists of the following key components.

Causal Graph Estimator: generate an initual causal graph by calling a SCD algorithm.
Data Augmentation Agent (DA-Agent): retrieves and summarizes semantic context (e.g., from web or log data) using search tools and LLMs.
Causal Constraint Agent (CC-Agent): integrates the augmented data with the initial causal graph to verify or refute causal links using a reasoning pipeline.
Causal Graph Refiner: reconstructs the final causal graph by combining LLM-inferred constraints with a SCD algorithm.


Figure 1: An illustration of MATMCD framework: (a) an overview of the framework, (b) the inner working of DA-Agent, and (c) the inner working of CC-Agent

🔑 Key Features

Multi-modal data: integrates time series data, metadata, web documents, and logs to enrich semantic context for causal discovery.
LLM reasoning: employs tool-augmented LLMs to reason over causal structures using external knowledge and contextual cues.
Modular design: features a modular architecture that allows easy swapping of LLMs and SCD algorithms for flexible adaptation.

🚀 Quick Start

Clone the Repository

git clone https://github.com/your_username/MATMCD.git
cd MATMCD

Set Up the Environment

We recommend using conda or virtualenv to create an isolated environment.

python3 -m venv venv
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt

Configure API Keys
- Add API-keys in config.py file.
Download the datasets
- The original data can be download from AutoMPG, DWD Climate, Sachs, Asic, Child and LEMMA_RCA datasets from LEMMA-RCA.
- The CSV format of the AutoMPG, DWD Climate, and Sachs datasets can be downloaded from here. The Asia and Child datasets can be converted to CSV format via script data/SampleFromBIF.py.
- Place the data in the data folder.
Run the Application
- Make sure the environment, API and dataset are accurate.
- Run python GTdatasets_experiment.py to start.
Run Experiments and Evaluate
- Run benchmark experiments on standard datasets:
```
python GTdatasets_experiment.py
```
- For root cause analysis on microservice datasets:
```
python RCA_experiment.py
```
- Results will be saved in the results/ folder.

📊 Evaluation

MATMCD is evaluated on:

Benchmark Datasets: AutoMPG, DWDClimate, SachsProtein, Asia, and Child — covering both time-series and sequence data.
AIOps Datasets: Product Review and Cloud Computing — large-scale multivariate time series with event logs.

Key results:

Up to 66.7% reduction of causal inference errors (in terms of NHD) over baseline methods.
Up to 83.3% improvement in root cause locating accuracy (in terms of MAP@10).

🔗 Citation

@inproceedings{shen2025MATMCD,
  title={Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery},
  author={Shen, ChengAo and Chen, Zhengzhang and Luo, Dongsheng and Xu, Dongkuan and Chen, Haifeng and Ni, Jingchao},
  booktitle={ACL (Findings)},
  year={2025}
}

📧 Contact

If you have any questions or concerns, please contact us: cshen9 [at] uh [dot] edu or submit an issue

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Client		Client
ConstrainAgent		ConstrainAgent
Utils		Utils
data		data
image		image
web_utils		web_utils
GTdatasets_experiment.py		GTdatasets_experiment.py
LEMMA_Metrics.py		LEMMA_Metrics.py
LEMMA_experiment.py		LEMMA_experiment.py
LICENSE		LICENSE
Log_tools.py		Log_tools.py
README.md		README.md
Web_tools.py		Web_tools.py
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

(ACL 2025 Findings) Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery

🔍About

🔧Framework

🔑 Key Features

🚀 Quick Start

📊 Evaluation

🔗 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

D2I-Group/matmcd

Folders and files

Latest commit

History

Repository files navigation

(ACL 2025 Findings) Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery

🔍About

🔧Framework

🔑 Key Features

🚀 Quick Start

📊 Evaluation

🔗 Citation

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages