Skip to content

🧬 EffectorFisher links pangenome isoforms to disease traits to predict phenotype-associated effectors. 🌾🦠

License

Notifications You must be signed in to change notification settings

ccdmb/EffectorFisher-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EffectorFisher-core Module (Python Library)

The EffectorFisher module is a Python library used for comparing pangenome-derived protein isoform profiles with host virulence/disease phenotyping, to predict candidate effectors with strong phenotypic-association. EffectorFisher can be used to refine the output of Predector, which combines multiple methods to predict proteins with effector-like properties.

EffectorFisher was developed at the Centre for Crop and Disease Management (CCDM) by Mohitul Hossain within RTP/GRDC-funded Ph.D. project CUR2301-006RSX, with additional support from the Western Australian Agricultural Research Collaboration (WAARC), under the supervision of Dr James Hane and co-supervision of Drs Huyen Phan and Kristina Gagalova. Assistance with code development was provided by Dr Kristina Gagalova and Mr Pavel Misiun, with testing performed by Ms Naomi Gray.

A manuscript describing this method is currently under review, if you use EffectorFisher please check this space for citation details.

Installation

EffectorFisher-core is a command-line tool, written as Python.

Requirements

  • Python 3.6 or newer. More details can be found here.
  • pip installed, with pip >= 21.0 recommended. More details about pip installation can be found here.
  • Internet connection to install from GitHub

Quick installation from GitHub

pip install git+https://github.com/ccdmb/EffectorFisher-core.git

This will:

  • Download the latest version of the tool
  • Install all required dependencies
  • Register the command-line tool effectorfisher-core.py

Manual Installation (From Cloned Repository)

If you prefer to work with the source:

git clone https://github.com/ccdmb/EffectorFisher-core.git
cd EffectorFisher-core
pip install .

To install in development mode (reflects source code changes automatically):

pip install -e .

Input Files

To run this module, you need to provide the following input files:

  1. Effector_variants_PAV_output.txt: This file is a required input for EffectorFisher-core and must be generated by running the EffectorFisher tool. The file can be found in the Final_PAV_result directory upon successful execution of the EffectorFisher pipeline. This file contains the presence–absence variation (PAV) matrix of predicted effector candidates across isolates. Detailed instructions for generating this file can be found in the "EffectorFisher" repo: https://github.com/muhitulh/EffectorFisher/tree/main. Note: Both the EffectorFisher and EffectorFisher-core modules are components of the associated manuscript.

  2. phenotype_data_quantitative.txt or phenotype_data_qualitative.txt:

    • phenotype_data_quantitative.txt: This file should contain numeric disease scores. You need to prepare this file as shown in the example.
    • phenotype_data_qualitative.txt: This file should contain disease severity levels (high or low). You need to prepare this file as shown in the example.
  3. predector_results.txt: This file is a required input for EffectorFisher-core and must be generated by running the Predector tool. Predector is a published tool in Scientific Reports (link) that prioritizes candidate effector proteins based on a range of effector-like features.
    Installation and usage instructions are available in the Predector GitHub repository.

  4. known_effector.txt (optional): You can provide known effector IDs and names in this file, as shown in the example. If this file is not provided, the module will not include known effector ranking in the final output.

Important: Make sure your input file names are the same as mentioned above and that they are located in the subdirectory 00_input_files within your working directory. Alternatively, you can provide the input file paths as command-line arguments (note: still working on it).

Directory Structure

Here's an example of the directory structure for running the EffectorFisher module:

working_directory/
β”œβ”€β”€ 00_input_files/
β”‚   β”œβ”€β”€ Effector_variants_PAV_output.txt
β”‚   β”œβ”€β”€ phenotype_data_quantitative.txt (or phenotype_data_qualitative.txt)
β”‚   β”œβ”€β”€ predector_results.txt
β”‚   └── known_effector.txt (optional)
β”œβ”€β”€ effectorfisher_core.py
└── ...

Make sure to place the input files in the 00_input_files directory within your working directory.

Usage

Run the pipeline with:

effectorfisher_core.py --data-type <qualitative|quantitative> [options]

Basic example

effectorfisher_core.py --data-type quantitative --input-dir 00_input_files/ --save

This will:

  • Process input files
  • Apply default filters
  • Save both intermediate and final output files

Final Output Only (No --save)

effectorfisher_core.py --data-type quantitative --input-dir 00_input_files/

Options

effectorfisher_core.py --help

usage: effectorfisher_core.py [-h] [--data-type {quantitative,qualitative}]
                              [--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
                              [--min-variant MIN_VARIANT] [--save]
                              [--cyst CYST] [--total-aa TOTAL_AA]
                              [--pred-score PRED_SCORE] [--p-value P_VALUE]

Process phenotype and variant data for EffectorFisher

optional arguments:
  -h, --help              Show help message and exit
  --data-type             Required. Either `quantitative` or `qualitative`
  --input-dir             Directory containing input files (default: `00_input_files`)
  --output-dir            Directory for output files (default: `output/`)
  --min-variant           Minimum isoform count (default: 5)
  --save                  Save all intermediate and final results
  --cyst                  Minimum cysteine count (default: 2)
  --total-aa              Maximum amino acid length (default: 300)
  --pred-score            Minimum prediction score (default: 2)
  --p-value               P-value threshold (default: 0.05)

Must include:

  • --data_type <data_type>: Specify the type of phenotypic data you have. Choose either qualitative or quantitative. See the examples in the input_files directory.

Important:

  • --min_iso <number>: Specify the minimum isoform number (default = 5).

Optional:

  • --cyst <number>: Specify the cysteine count threshold (default = 2).
  • --pred_score <number>: Specify the prediction score threshold (default = 2).
  • --total_aa <number>: Specify the total amino acid count threshold (default = 300).
  • --p_value <number>: Specify the p-value threshold (default = 0.05).

Example

effectorfisher_core.py --data_type quantitative --min_iso 5 --cyst 2 --pred_score 2 --total_aa 300 --p_value 0.05

Output

Main Output

File Name Description
complete_isoform_list.txt Complete list of isoforms processed by the module.
complete_loci_list.txt Complete list of loci processed by the module.

Additional Output

File Name Description
filtered_loci_list.txt List of loci based on the default or specified filters. Alternatively, you can apply filters to complete_locus_list.txt as required.
known_effectors_ranking.txt Contains the ranking of known effectors if you provide a known effector input file.

Additional results: Rank the known effectors after filtering.

About

🧬 EffectorFisher links pangenome isoforms to disease traits to predict phenotype-associated effectors. 🌾🦠

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages