Skip to content

MendenLab/HelicoPredict-new-main

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

HelicoPredict-new

HelicoPredict-new is a research-oriented project focused on identifying resistance biomarkers in Helicobacter pylori strains by applying various machine learning (ML) methods to mutation data. Our approach demonstrates that gene-wise aggregated association feature selection yields the most generalizable predictive performance and uncovers several loci with biological relevance to resistance.

Project Overview

Helicobacter pylori is a common bacterium linked to various gastrointestinal diseases. Understanding genetic mutations that confer antibiotic resistance is crucial for effective treatment. This project aims to:

  • Apply multiple machine learning algorithms to mutation data from H. pylori strains.
  • Systematically evaluate feature selection methods, with an emphasis on gene-wise aggregated association.
  • Identify biomarkers and loci associated with antibiotic resistance, providing biological and clinical insights.

Key Findings

  • Gene-wise Aggregated Association Feature Selection: This method outperforms other feature selection strategies in terms of generalizability and predictive accuracy.
  • Biologically Relevant Loci: The approach successfully identifies key loci associated with resistance, validated by biological evidence.

Main Features

  • Data preprocessing and mutation encoding
  • Implementation of various ML models (e.g., Random Forest, SVM, XGBoost, Feed-forward neural network)
  • Proposed performance aggregation and enemble models to get more robus evaluation
  • Visualization and interpretation of ML results, i.e., SHAP scores
  • Biological interpretation on putative resistance biomarkers

Getting Started

Prerequisites

  • Python 3.8+
  • Encode data into Category A (SNV), Category B (Asynomous amino acid mutation), and Category C (loss of function)

Installation

Clone this repository:

git clone https://github.com/DiyuanLu/HelicoPredict-new.git
cd HelicoPredict-new

Install required packages:

pip install -r requirements.txt

Usage

  1. Prepare your mutation data in the expected format (see data/ directory for examples).

  2. Run the main analysis script:

    python cluster_run_main.py
  3. Outputs and results will be saved in the results/ directory.

Project Structure

HelicoPredict-new/
├── data/               # Input data and data examples
├── src/                # Core scripts and modules
├── results/            # Output and result files
├── requirements

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages