Anomaly detection in multivariate time series is essential across domains such as healthcare, cybersecurity, and industrial monitoring, yet remains fundamentally challenging due to high-dimensional dependencies, the presence of cross-correlations between time-dependent variables, and the scarcity of labeled anomalies. We introduce mTSBench, the largest benchmark to date for multivariate time series anomaly detection, consisting of 344 labeled time series from a wide range of domains. We comprehensively evaluate 24 anomaly detectors, including the only two publicly available large language model-based methods for multivariate time series. Consistent with prior findings, we observe that no single detector dominates across datasets, motivating the need for effective model selection. We benchmark three recent detector selection methods and find that even the strongest of them remain far from optimal.
We introduce mTSBench the largest benchmark to date for multivariate time series anomaly detection. It includes
- 344 labeled time series across 12 domains
- 24 anomaly detectors, including the only two publicly available large language model-based methods for multivariate time series
- 3 model selection methods
- 12 anomaly detection evaluation metrics and 3 model selection metrics
Step 1: Clone this repository using git and change into its root directory.
git clone https://github.com/PLAN-Lab/mTSBench.gitStep 2: Create and activate a conda environment named mTSB.
conda create -n mTSB python=3.11
conda activate mTSBStep 3: Install the dependencies from requirements.txt:
pip install -r requirements.txtTo download the full dataset:
git lfs install
cd Datasets
git clone https://huggingface.co/datasets/PLAN-Lab/mTSBenchTo use part of data, check under Datasets folder for instructions.
To run evaluation with an detector, follow the example below.
import pandas as pd
import time
import os
import sys
from Detectors.model_wrapper import run_Unsupervise_AD, run_Semisupervise_AD
from Detectors.evaluation.metrics import get_metrics
from filelock import FileLock
from datasets import load_dataset
# Load just the CalIt2 dataset
data_name = "CalIt2"
calit2 = load_dataset("PLAN-Lab/mTSBench", data_dir=data_name)
# Convert to pandas
df = calit2["test"].to_pandas()
### Alternatively, if the above does not work, using exact path
# url = "https://huggingface.co/datasets/PLAN-Lab/mTSBench/resolve/main/MSL/MSL_T-13_val.csv"
# df = pd.read_csv(url)
# using KMeansAD
detector = 'KMeansAD'
data = df.iloc[:, 1:-1].values.astype(float) # excluede first (time stamp) and last (label) columns
label = df['is_anomaly'].astype(int).to_numpy()
start = time.time()
output = run_Unsupervise_AD(detector, data) # change this to run_Semisupervise_AD if the detector is semi-supervised
end = time.time()
runtime = end - start
metrics = get_metrics(output, label)
record = {
"data_file": data_name,
"model": detector,
"runtime": runtime,
**metrics
}
# Save results to results/<detector>_evaluation_results.csv
results_dir = "results"
os.makedirs(results_dir, exist_ok=True)
log_file = os.path.join(results_dir, f"{detector}_evaluation_results.csv")
with FileLock(log_file + ".lock"):
df_log = pd.DataFrame([record])
write_header = not os.path.exists(log_file)
df_log.to_csv(log_file, mode='a', header=write_header, index=False)mTSBench includes three selection methods: MetaOD, FMMS, and Orthus. While these are unsupervised selection methods, the training requires a performance matrix. Therefore, we hold out a validation time series from each of the 19 datasets.
Set up the environment with
You can use the follow code to test out pretrained MetaOD on any time series with the following code. You can replace CalIts with other name listed in data_summary.csv under Datasets folder.
cd Selectors/MetaOD
python metaod_example.py CalIt2Check the instructions under MetaOD folder for running the entire experiments on all 344 time series.
Set up the environment with
You can use the follow code to test out pretrained FMMS on any time series with the following code. You can replace CalIts with other name listed in data_summary.csv under Datasets folder.
cd Selectors/FMMS
python FMMS_example.py CalIt2The original implementation of Orthus is in R. While we do plan to provide a Python version, you have to download R and R studio to use Orthus for the moment.
- Set up the environment with
main.RunderSelectors/Orthus. It is install all required libraries - Get metafeature on training set with
get_train_meta_features.R - Get recommendation with
get_rec_on_test.R
| Dataset | Domain | #TS | #Dims | Length | #AnomPts | #AnomSeqs |
|---|---|---|---|---|---|---|
| CalIt2 | Smart Building | 1 | 3 | >5K | 0 | 21 |
| CreditCard | Finance / Fraud Detection | 1 | 30 | >100K | 219 | 10 |
| Daphnet | Healthcare | 26 | 10 | >50K | 0 | 1β16 |
| Exathlon | Cloud Computing | 30 | 21 | >50K | 0β4 | 0β6 |
| GECCO | Water Quality Monitoring | 1 | 10 | >50K | 0 | 37 |
| GHL | Industrial Process | 14 | 17 | >100K | 0 | 1β4 |
| Genesis | Industrial Automation | 1 | 19 | >5K | 0 | 2 |
| GutenTAG | Synthetic Benchmark | 30 | 21 | >10K | 0 | 1β3 |
| MITDB | Healthcare | 47 | 3 | >500K | 0 | 1β720 |
| MSL | Spacecraft Telemetry | 26 | 56 | >5K | 0 | 1β3 |
| OPPORTUNITY | Human Activity Recognition | 13 | 33 | >25K | 0 | 1 |
| Occupancy | Smart Building | 2 | 6 | >5K | 1β3 | 9β13 |
| PSM | IT Infrastructure | 1 | 27 | >50K | 0 | 39 |
| SMAP | Spacecraft Telemetry | 48 | 26 | >5K | 0 | 1β3 |
| SMD | IT Infrastructure | 18 | 39 | >10K | 0 | 4β24 |
| SVDB | Healthcare | 78 | 3 | >100K | 0 | 2β678 |
| CIC-IDS-2017 | Cybersecurity | 5 | 73 | >100K | 0β8656 | 0β2546 |
| Metro | Transportation | 1 | 6 | >10K | 20 | 5 |
| SWAN-SF | Industrial Process | 1 | 39 | >50K | 5233 | 1382 |
| Learning | Anomaly Detection Method | Area | Method Family |
|---|---|---|---|
| Unsupervised | CBLOF | Outlier Detection | Distance |
| COPOD | Outlier Detection | Distribution | |
| EIF | Classic ML | Tree | |
| HBOS | Classic ML | Distribution | |
| IForest | Outlier Detection | Tree | |
| KMeansAD | Classic ML | Distance | |
| KNN | Classic ML | Distance | |
| LOF | Outlier Detection | Distance | |
| PCA | Classic ML | Reconstruction | |
| RobustPCA | Classic ML | Reconstruction | |
| Semi-supervised | AnomalyTransformer | Deep Learning | Forecasting |
| AutoEncoder | Deep Learning | Reconstruction | |
| CNN | Deep Learning | Reconstruction | |
| Donut | Deep Learning | Reconstruction | |
| FITS | Deep Learning | Forecasting | |
| LSTMAD | Deep Learning | Forecasting | |
| MCD | Classic ML | Reconstruction | |
| OCSVM | Outlier Detection | Distribution | |
| OmniAnomaly | Deep Learning | Reconstruction | |
| TimesNet | Deep Learning | Forecasting | |
| TranAD | Deep Learning | Forecasting | |
| USAD | Deep Learning | Reconstruction | |
| ALLM4TS | LLM | Foundation Model | |
| OFA | LLM | Foundation Model |
If you have any questions or suggestions, feel free to contact:
- Xiaona Zhou (xiaonaz2@illinois.edu)
Or describe it in Issues.
We appreciate the following github repos a lot for their valuable code base:
- https://github.com/TheDatumOrg/TSB-AD
- https://github.com/yzhao062/MetaOD
- https://github.com/bettyzry/FMMS
- https://openreview.net/forum?id=7cUV9K3ns9Q
- https://github.com/yxbian23/aLLM4TS
- https://github.com/yzhao062/pyod
- https://github.com/TimeEval/TimeEval-algorithms
- https://github.com/thuml/Time-Series-Library/
- https://github.com/dawnvince/EasyTSAD
@article{zhou2025mtsbench,
title={mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale},
author={Zhou, Xiaona and Brif, Constantin and Lourentzou, Ismini},
journal={arXiv preprint arXiv:2506.21550},
year={2025}
}
