Skip to content

This project implements a threading algorithm based on double dynamic programming (DDP) for protein structure recognition and sequence–structure alignment.

Notifications You must be signed in to change notification settings

gaelleloutfi/Threading-by-Double-Dynamic-Programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bioinf

Threading-by-Double-Dynamic-Programming

Getting Started

Setup Environment

We use pixi to manage dependencies and the project software environment.
On Linux (or WSL for Windows) or macOS, Pixi can be installed by running the following command:

curl -fsSL https://pixi.sh/install.sh | sh

The executable pixi will hence be installed in the directory ~/.pixi/bin/.
In order to be able to use pixi, you need to add this directory to the PATH environment variable by running the following command:

echo 'export PATH=$PATH:$HOME/.pixi/bin' >> ~/.bashrc

Finally, you can enable command auto-completion with:

echo 'eval "$(pixi completion --shell bash)"' >> ~/.bashrc
source ~/.bashrc

To verify that Pixi is installed correctly, run the following command:

pixi --version

Clone the Github repository

git clone https://github.com/gaelleloutfi/Threading-by-Double-Dynamic-Programming.git
cd Threading-by-Double-Dynamic-Programming

Structure

Threading-by-Double-Dynamic-Programming/
│── data/                  # contains the files .pdb, .fasta & dope.par
│ ├── dope.par             # contains the atomic distance-dependent statistical potentials 
│ ├── Analyse2_ss/         # contains the data for the analysis on secondary structure recognition
| ├── data_Globin_vs_Phyco # contains the data used for the globin & phycocyanin tests
| ├── Time Analysis        # contains the data used for the calibration
│── doc/                   # contains the Report Paper
│ ├── LOUTFI_rapport.pdf
│── src/                   # contains the codes
│ ├── ddp_threader.py      # main threader application
│ ├── clean_fasta.py       # for replacing ambiguous residues with glycine (G)
│ ├── calibrate_and_plot.py# to generate the time heatmap
│ ├── summarize_fastaa.py  # to get a statistical overview of the data
│── results/             # contains the csv result files (energies + alignment)
│ ├── Analyse2_ss/
│ ├── Analyse2_ss_stats/
│ ├── Globin_vs_Phyco/
│ ├── data_globin_phyco_stats/
│ ├── Time_Analysis/  
│── README.md            # Project Documentation

Launch the Threading

Cleaning the files:

To make sure the code won't crash, the fasta files need to be priorly cleaned. Aka, they should not contain any ambiguous residue.
clean_fasta.py was used to generate the files present in the "cleaned" folders. However if you want to use other files, make sure to run:

pixi run python src/clean_fasta.py --dir data/input_dir --out-dir results/output_dir

Threading:

There are two ways to run the code.
1- Graphical User Interface (GUI)

pixi run python src/ddp_threader.py --gui

I have found this to be a bit slower, and less convenient for automation. But it is more user-friendly.
2- Command Line Interface (CLI)

pixi run python src/ddp_threader.py \
--outdir results \
--gap-ss 5.0 \
--gap-coil 0.5 \
--workers 12 \
data/*.pdb data/cleaned/*.fasta

Better for batch runs and automation. Note that I used 12 workers since I have 16 CPU cores, please adapt according to your device's specs. It is 1 worker by default, but that is too slow for most devices.

Notes:

  • To run calibrate_and_plot.py :
pixi run python src/calibrate_and_plot.py

The script has hardcoded paths inside (data/Time Analysis/1IRO.pdb, data/Time Analysis/rcsb_pdb_5PTI.fasta). It does not take command-line arguments.

  • To run summarize_fasta.py:
pixi run python src/summarize_fastas.py --dir data/ --out-dir data_stats

Results:

The results of all our analyses can be found in the results/ directory. Your own results can be added by making sure to sepcify the correct path when running your own analyses.

Contacts

About

This project implements a threading algorithm based on double dynamic programming (DDP) for protein structure recognition and sequence–structure alignment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages