Skip to content

Analysis of protein domain architectures by combining the linguistic approach n-gram analysis with network theory.

License

Notifications You must be signed in to change notification settings

NaegleLab/DANSy

Repository files navigation

Domain Architecture Network Syntax (DANSy)

This is our analysis that applies the linguistic technique n-gram analysis with network theory to protein domain architectures, to represent the proteome as an abstracts the functional connections between proteins to describe either proteome-wide (base DANSy) or phenotype-specific changes from differential expression results (deDANSy).

How to cite: Please cite our bioRxiv paper, which contains further details and specific applications of the code provided here.

Documentation: https://naeglelab.github.io/DANSy/

Getting started

First clone the repo and then create a virtual environment containing all the dependencies for the analysis using the following code in a terminal.

conda create env -f dansy.yml

Activate the environment using conda activate dansy for specific scripts or select the dansy kernel for jupyter notebooks.

Proteome Reference File

DANSy relies on reference files generated by CoDIAC. We have provided a reference file, which was generated on May 12th, 2025, and will be the default file used for analysis.

If you wish to generate the most up to date reference file to use for analysis, you will to take the following steps. First download the SwissProt ID list from Gencode and place in the main directory of your local copy of this repo. Then, go to the whole_proteome_reference.py file and change the reference file suffix variable to the current date. Finally, run the following code in a terminal to establish the environment that includes CoDIAC, which will query UniProt and InterPro for the domain architectures. (Note: This can take up to 2 hours after a fresh install, as it will also establish a pybiomart sqlite database.)

conda create env -f codiac.yml
conda activate codiac-env
python scripts/whole_proteome_reference.py
conda deactivate

DANSy Overview

Overview of the general workflow

deDANSy Overview

Overview of the deDANSy workflow

Example applications

For examples on how to get started please visit the Examples in our documentation.

For specific applications of DANSy or deDANSy, please see our DANSy_Applications repo. There, you will find Jupyter notebooks on applications on the whole proteome, the convergence of grammar during the evolution of reversible post-translational modification systems, cancer fusions genes, and differential gene expression from RNA-sequencing results (for deDANSy specifically).

About

Analysis of protein domain architectures by combining the linguistic approach n-gram analysis with network theory.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages