Neural Pattern Learning (NEPAL) Attack

This repository accompanies the paper “NEPAL: Climbing Toward the Peak of Re-Identification in Privacy-Preserving Record Linkage”, which introduces the Neural Pattern Learning (NEPAL) Attack.
It provides documentation and resources for reproducing the experiments and analyses presented in the paper.

Project Overview

The NEPAL Attack models a machine learning–based adversary that performs re-identification in Privacy-Preserving Record Linkage (PPRL) systems based on known plaintext–encoding pairs.
Unlike traditional Pattern Mining Attacks (PMAs) that rely on scheme-specific heuristics, NEPAL formulates pattern mining as a general learning problem. It uses neural networks to learn correlations between encoded records and their underlying plaintext structures, enabling large-scale, scheme-agnostic plaintext reconstruction.

The attack consists of two major stages:

Pattern Mining – A neural model learns mappings between encodings and their constituent q-grams (substrings of the original identifiers). This is framed as a multi-label classification task.
Plaintext Reconstruction – The predicted q-grams are assembled into complete identifiers using a graph-based reconstruction algorithm.

For a detailed description of the attacker model, theoretical background, and evaluation results, see the paper.

Getting Started with Docker

The simplest way to reproduce the NEPAL pipeline is to run the implementation repository inside Docker.
The following setup reproduces the environment used during paper preparation:

git clone <nepal-repository>
cd <nepal-repository>
git submodule update --init --recursive --remote

docker build -t nepal .
docker run --gpus all -it -v $(pwd):/usr/app nepal bash

Note: GPU access is optional but strongly recommended for hyperparameter optimization. The repository will be mounted inside the container at /usr/app.

Running the Default NEPAL Case

A default configuration is provided for the NEPAL attack. Once inside the container, execute:

python3 main.py --config nepal_config.json

This command launches the complete NEPAL pipeline, including:

data preprocessing,
neural model training, and
plaintext reconstruction.

Results are written to the experiment_results directory. See docs/parameters.md for a detailed explanation of configuration options and schema.

Prepare your Dataset

The code expects a tab-separated file with one record per row. The fist row must be a header specifying the column names. Internally, the values stored in the columns are concatenated according to column ordering and normalized (switch to lowercase, remove whitespace and missing values). The last column must contain a unique ID.

If you have data in .csv, .xls or .xlsx format, you may run python preprocessing.py for convenient conversion. The script will guide you through the process.

In the data directory, this repository already provides datasets which can be used.

Batch Experiment Setup

To run multiple experiments or reproduce the experiments from the paper, use the experiment script:

python3 experiment_setup.py

This script runs multiple configurations automatically to produce results as in the paper.

Analysis and Evaluation

The analysis notebook analysis.ipnb reproduce the figures reported in the paper. Open the notebook and ensure that the output file from extract_nepal_results.py is generated correctly. extract_nepal_results.py uses the results produced in experiment_results

Summary of Key Contributions (from the Paper)

Generalized Pattern Learning:

NEPAL reframes cryptanalysis of similarity-preserving encodings as a supervised learning task, enabling the model to learn directly from encoding–plaintext pairs and generalize across multiple encoding schemes.

Two-Stage Attack Pipeline

(1) Pattern Mining using neural networks to predict constituent q-grams from encoded data, and (2) Plaintext Reconstruction assembling the predicted fragments into complete identifiers.

Comprehensive Evaluation

Experiments were conducted on eight datasets (including FakeName, Euro Person, and Titanic), across three encoding schemes: Bloom Filters (BF), Two-Step Hashing (TSH), and Tabulation MinHash (TMH).

Performance Highlights

Achieved Dice coefficients up to 0.997 (indicating near-perfect q-gram reconstruction).
Re-identified up to 33.05% of encoded records exactly.
Demonstrated that TSH and BF are the most vulnerable encoding schemes, while TMH is more resilient.

Additional Documentation

Additional Information about Noisy Datasets, Parameters and Reproduction can be found in docs

Citation

If you use this repository or reproduce results from the NEPAL paper, please cite:

(TBD)

Contact

For questions or clarifications regarding the implementation or replication of experiments, please refer to the code repository or contact the paper authors.

License

This code is licensed under GPLv3

Name		Name	Last commit message	Last commit date
Latest commit History 323 Commits
data		data
docs		docs
experiment_results		experiment_results
graphMatching @ 528d10a		graphMatching @ 528d10a
pytorch_datasets		pytorch_datasets
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
add_noise_and_swap_records.py		add_noise_and_swap_records.py
analysis.ipynb		analysis.ipynb
encode_datasets.py		encode_datasets.py
experiment_setup.py		experiment_setup.py
extract_nepal_results.py		extract_nepal_results.py
main.py		main.py
nepal.py		nepal.py
nepal_config.json		nepal_config.json
preprocessing.py		preprocessing.py
prune_duplicate_experiments.py		prune_duplicate_experiments.py
requirements.txt		requirements.txt
requirements_macOS.txt		requirements_macOS.txt
tmp.ipynb		tmp.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Pattern Learning (NEPAL) Attack

Project Overview

Getting Started with Docker

Running the Default NEPAL Case

Prepare your Dataset

Batch Experiment Setup

Analysis and Evaluation

Summary of Key Contributions (from the Paper)

Generalized Pattern Learning:

Two-Stage Attack Pipeline

Comprehensive Evaluation

Performance Highlights

Additional Documentation

Citation

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

marcelmildenberger/neural-pattern-learning-attack

Folders and files

Latest commit

History

Repository files navigation

Neural Pattern Learning (NEPAL) Attack

Project Overview

Getting Started with Docker

Running the Default NEPAL Case

Prepare your Dataset

Batch Experiment Setup

Analysis and Evaluation

Summary of Key Contributions (from the Paper)

Generalized Pattern Learning:

Two-Stage Attack Pipeline

Comprehensive Evaluation

Performance Highlights

Additional Documentation

Citation

Contact

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages