The EffectorFisher module is a Python library used for comparing pangenome-derived protein isoform profiles with host virulence/disease phenotyping, to predict candidate effectors with strong phenotypic-association. EffectorFisher can be used to refine the output of Predector, which combines multiple methods to predict proteins with effector-like properties.
EffectorFisher was developed at the Centre for Crop and Disease Management (CCDM) by Mohitul Hossain within RTP/GRDC-funded Ph.D. project CUR2301-006RSX, with additional support from the Western Australian Agricultural Research Collaboration (WAARC), under the supervision of Dr James Hane and co-supervision of Drs Huyen Phan and Kristina Gagalova. Assistance with code development was provided by Dr Kristina Gagalova and Mr Pavel Misiun, with testing performed by Ms Naomi Gray.
A manuscript describing this method is currently under review, if you use EffectorFisher please check this space for citation details.
EffectorFisher-core is a command-line tool, written as Python.
- Python 3.6 or newer. More details can be found here.
pipinstalled, withpip >= 21.0recommended. More details aboutpipinstallation can be found here.- Internet connection to install from GitHub
pip install git+https://github.com/ccdmb/EffectorFisher-core.git
This will:
- Download the latest version of the tool
- Install all required dependencies
- Register the command-line tool effectorfisher-core.py
If you prefer to work with the source:
git clone https://github.com/ccdmb/EffectorFisher-core.git
cd EffectorFisher-core
pip install .
To install in development mode (reflects source code changes automatically):
pip install -e .
To run this module, you need to provide the following input files:
-
Effector_variants_PAV_output.txt: This file is a required input for EffectorFisher-core and must be generated by running the EffectorFisher tool. The file can be found in theFinal_PAV_resultdirectory upon successful execution of the EffectorFisher pipeline. This file contains the presenceβabsence variation (PAV) matrix of predicted effector candidates across isolates. Detailed instructions for generating this file can be found in the "EffectorFisher" repo: https://github.com/muhitulh/EffectorFisher/tree/main. Note: Both the EffectorFisher and EffectorFisher-core modules are components of the associated manuscript. -
phenotype_data_quantitative.txtorphenotype_data_qualitative.txt:phenotype_data_quantitative.txt: This file should contain numeric disease scores. You need to prepare this file as shown in the example.phenotype_data_qualitative.txt: This file should contain disease severity levels (high or low). You need to prepare this file as shown in the example.
-
predector_results.txt: This file is a required input for EffectorFisher-core and must be generated by running the Predector tool. Predector is a published tool in Scientific Reports (link) that prioritizes candidate effector proteins based on a range of effector-like features.
Installation and usage instructions are available in the Predector GitHub repository. -
known_effector.txt(optional): You can provide known effector IDs and names in this file, as shown in the example. If this file is not provided, the module will not include known effector ranking in the final output.
Important: Make sure your input file names are the same as mentioned above and that they are located in the subdirectory 00_input_files within your working directory. Alternatively, you can provide the input file paths as command-line arguments (note: still working on it).
Here's an example of the directory structure for running the EffectorFisher module:
working_directory/
βββ 00_input_files/
β βββ Effector_variants_PAV_output.txt
β βββ phenotype_data_quantitative.txt (or phenotype_data_qualitative.txt)
β βββ predector_results.txt
β βββ known_effector.txt (optional)
βββ effectorfisher_core.py
βββ ...
Make sure to place the input files in the 00_input_files directory within your working directory.
Run the pipeline with:
effectorfisher_core.py --data-type <qualitative|quantitative> [options]
effectorfisher_core.py --data-type quantitative --input-dir 00_input_files/ --save
This will:
- Process input files
- Apply default filters
- Save both intermediate and final output files
effectorfisher_core.py --data-type quantitative --input-dir 00_input_files/
effectorfisher_core.py --help
usage: effectorfisher_core.py [-h] [--data-type {quantitative,qualitative}]
[--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
[--min-variant MIN_VARIANT] [--save]
[--cyst CYST] [--total-aa TOTAL_AA]
[--pred-score PRED_SCORE] [--p-value P_VALUE]
Process phenotype and variant data for EffectorFisher
optional arguments:
-h, --help Show help message and exit
--data-type Required. Either `quantitative` or `qualitative`
--input-dir Directory containing input files (default: `00_input_files`)
--output-dir Directory for output files (default: `output/`)
--min-variant Minimum isoform count (default: 5)
--save Save all intermediate and final results
--cyst Minimum cysteine count (default: 2)
--total-aa Maximum amino acid length (default: 300)
--pred-score Minimum prediction score (default: 2)
--p-value P-value threshold (default: 0.05)
Must include:
--data_type <data_type>: Specify the type of phenotypic data you have. Choose eitherqualitativeorquantitative. See the examples in theinput_filesdirectory.
Important:
--min_iso <number>: Specify the minimum isoform number (default = 5).
Optional:
--cyst <number>: Specify the cysteine count threshold (default = 2).--pred_score <number>: Specify the prediction score threshold (default = 2).--total_aa <number>: Specify the total amino acid count threshold (default = 300).--p_value <number>: Specify the p-value threshold (default = 0.05).
effectorfisher_core.py --data_type quantitative --min_iso 5 --cyst 2 --pred_score 2 --total_aa 300 --p_value 0.05
| File Name | Description |
|---|---|
complete_isoform_list.txt |
Complete list of isoforms processed by the module. |
complete_loci_list.txt |
Complete list of loci processed by the module. |
| File Name | Description |
|---|---|
filtered_loci_list.txt |
List of loci based on the default or specified filters. Alternatively, you can apply filters to complete_locus_list.txt as required. |
known_effectors_ranking.txt |
Contains the ranking of known effectors if you provide a known effector input file. |
Additional results: Rank the known effectors after filtering.