mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

Anomaly detection in multivariate time series is essential across domains such as healthcare, cybersecurity, and industrial monitoring, yet remains fundamentally challenging due to high-dimensional dependencies, the presence of cross-correlations between time-dependent variables, and the scarcity of labeled anomalies. We introduce mTSBench, the largest benchmark to date for multivariate time series anomaly detection, consisting of 344 labeled time series from a wide range of domains. We comprehensively evaluate 24 anomaly detectors, including the only two publicly available large language model-based methods for multivariate time series. Consistent with prior findings, we observe that no single detector dominates across datasets, motivating the need for effective model selection. We benchmark three recent detector selection methods and find that even the strongest of them remain far from optimal.

📄 Overview

We introduce mTSBench the largest benchmark to date for multivariate time series anomaly detection. It includes

344 labeled time series across 12 domains
24 anomaly detectors, including the only two publicly available large language model-based methods for multivariate time series
3 model selection methods
12 anomaly detection evaluation metrics and 3 model selection metrics

⚙️ Get Started

💻 Installation

Step 1: Clone this repository using git and change into its root directory.

git clone https://github.com/PLAN-Lab/mTSBench.git

Step 2: Create and activate a conda environment named mTSB.

conda create -n mTSB python=3.11   
conda activate mTSB

Step 3: Install the dependencies from requirements.txt:

pip install -r requirements.txt

🗄️ Dataset

To download the full dataset:

git lfs install
cd Datasets
git clone https://huggingface.co/datasets/PLAN-Lab/mTSBench

To use part of data, check under Datasets folder for instructions.

🔍 Anomaly Detectors

To run evaluation with an detector, follow the example below.

import pandas as pd
import time
import os
import sys
from Detectors.model_wrapper import run_Unsupervise_AD, run_Semisupervise_AD
from Detectors.evaluation.metrics import get_metrics
from filelock import FileLock
from datasets import load_dataset

# Load just the CalIt2 dataset
data_name = "CalIt2"
calit2 = load_dataset("PLAN-Lab/mTSBench", data_dir=data_name)
# Convert to pandas
df = calit2["test"].to_pandas()

### Alternatively, if the above does not work, using exact path
# url = "https://huggingface.co/datasets/PLAN-Lab/mTSBench/resolve/main/MSL/MSL_T-13_val.csv"
# df = pd.read_csv(url)


# using KMeansAD
detector = 'KMeansAD'

data = df.iloc[:, 1:-1].values.astype(float) # excluede first (time stamp) and last (label) columns 
label = df['is_anomaly'].astype(int).to_numpy()

start = time.time()
output = run_Unsupervise_AD(detector, data) # change this to run_Semisupervise_AD if the detector is semi-supervised
end = time.time()
runtime = end - start

metrics = get_metrics(output, label)
record = {
    "data_file": data_name, 
    "model": detector,
    "runtime": runtime,
    **metrics
}

# Save results to results/<detector>_evaluation_results.csv
results_dir = "results"
os.makedirs(results_dir, exist_ok=True)

log_file = os.path.join(results_dir, f"{detector}_evaluation_results.csv")
with FileLock(log_file + ".lock"):
    df_log = pd.DataFrame([record])
    write_header = not os.path.exists(log_file)
    df_log.to_csv(log_file, mode='a', header=write_header, index=False)

🎯 Model Selection

mTSBench includes three selection methods: MetaOD, FMMS, and Orthus. While these are unsupervised selection methods, the training requires a performance matrix. Therefore, we hold out a validation time series from each of the 19 datasets.

Run model selection with MetaOD

Set up the environment with

You can use the follow code to test out pretrained MetaOD on any time series with the following code. You can replace CalIts with other name listed in data_summary.csv under Datasets folder.

cd Selectors/MetaOD
python metaod_example.py CalIt2

Check the instructions under MetaOD folder for running the entire experiments on all 344 time series.

Run model selection with FMMS

Set up the environment with

You can use the follow code to test out pretrained FMMS on any time series with the following code. You can replace CalIts with other name listed in data_summary.csv under Datasets folder.

cd Selectors/FMMS
python FMMS_example.py CalIt2

Run model selection with Orthus

The original implementation of Orthus is in R. While we do plan to provide a Python version, you have to download R and R studio to use Orthus for the moment.

Set up the environment with main.R under Selectors/Orthus. It is install all required libraries
Get metafeature on training set with get_train_meta_features.R
Get recommendation with get_rec_on_test.R