AnonimaData

Exam project by Pietro Coloretti, Leonardo Gennaioli and Iacopo Sbalchiero for the course Scalable and Reliable Services in the Computer Engineering Second Cycle Degree @ Unibo, Academic Year 2024-2025.

Abstract

AnonimaData is a comprehensive data management and analysis toolkit designed to streamline workflows for data scientists, analysts, and developers. The repository provides modular utilities for data ingestion, cleaning, transformation, visualization, and export. With a focus on scalability and ease of use, AnonimaData supports multiple data formats and integrates seamlessly with popular data science libraries. The project aims to foster reproducible research and efficient data-driven decision-making.

Repository Structure

AnonimaData/
├── backend/                # Python microservices for data anonymization workflows
│   ├── anonymizer/         # Implements anonymization algorithms (k-Anonymity, l-Diversity, etc.)
│   ├── formatter/          # Handles data formatting and preprocessing
│   └── orchestratore/      # Orchestrates workflow and service coordination
├── docs/                   # Project report and ppt (italian language)
├── frontend/               # React-based web application for user interaction
├── stressTests/            # Scripts for stress testing and performance evaluation
├── main.tf                 # Terraform configuration for infrastructure setup
├── README.md               # Project overview and instructions
└── variables.tf            # Terraform variables for deployment customization

General Information

Architecture

Backend: Python microservices for orchestrating anonymization workflows, formatting data, and applying privacy algorithms. Each service is containerized and deployed on Google Cloud Run.
Frontend: React application built with Vite, styled using Tailwind CSS, providing an intuitive interface for dataset upload, configuration, and result visualization.
Infrastructure: Managed via Terraform, with resources for Cloud Run services, Pub/Sub topics/subscriptions, VPC connectors, and service accounts.
Messaging: Google Pub/Sub is used for decoupled communication between services (Formatter, Orchestrator, Anonymizer).

Key Features

Dataset Upload: Supports CSV, Excel, JSON, and TXT formats.
Column Configuration: Automatic detection of column types, user selection of Quasi-Identifiers (QI), and columns to anonymize.
Anonymization Methods: k-Anonymity, l-Diversity, Differential Privacy, with configurable parameters.
Job Management: Track status and download anonymized datasets and samples.
Extensibility: Easily add new anonymization algorithms or data processing modules.

The homepage of the web application

Deployment

Docker: Each service and the frontend can be built and pushed as Docker images.
Terraform: Infrastructure as code for reproducible cloud deployments.
Cloud Run: Scalable, serverless execution of backend and frontend services.

Testing

Stress tests and performance scripts are available in 📂 stressTests.

Authors

Name	GitHub Profile
Pietro Coloretti	GitHub
Leonardo Gennaioli	GitHub
Iacopo Sbalchiero	GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AnonimaData

Abstract

Repository Structure

General Information

Architecture

Key Features

Deployment

Testing

Authors

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
assets		assets
backend		backend
docs		docs
frontend		frontend
stressTests		stressTests
.gitignore		.gitignore
README.md		README.md
main.tf		main.tf
variables.tf		variables.tf

IacopoSb/AnonimaData

Folders and files

Latest commit

History

Repository files navigation

AnonimaData

Abstract

Repository Structure

General Information

Architecture

Key Features

Deployment

Testing

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages