InterPro is a database that brings together predictive information on protein function from multiple partner resources. It provides an integrated view of the families, domains and functional sites to which a given protein belongs.
InterProScan is the command‑line tool that allows you to scan protein or nucleotide sequences against the InterPro member‑database signatures in a single workflow. Researchers with novel sequences can use InterProScan to annotate their data with family classifications, domain architectures and site predictions.
Before you begin, install:
- Nextflow 25.04 or later
- A container runtime. The currently supported are:
You don't need anything else, Nextflow will download the workflow from GitHub, and required data are automatically downloaded when running InterProScan.
Important
Phobius, SignalP and DeepTMHMM require separate licenses and downloads. See Licensed analyses.
If you have Docker and Nextflow installed, you can quickly test InterProScan and download the required data by running:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker,test \
--datadir data \
--interpro latestExplanation of parameters:
-r 6.0.0: Specifies the version of InterProScan to run. We strongly recommend always specifying a version to ensure consistent and reproducible results.-profile docker,test:docker: Executes tasks in Docker containers.test: Uses a small test FASTA file included in the workflow.
--datadir data: Sets thedatadirectory as the location to store InterPro and member database files. The directory will be created automatically if it doesn't exist, and required files will be downloaded into it.--interpro latest: Uses the latest available InterPro data release.
Note
While --interpro latest is the default, we strongly recommend pinning a specific version (e.g. --interpro 107.0) to ensure reproducibility.
After the run completes, the following files will be created in your working directory:
test.faa.gff3: Annotations in GFF3 formattest.faa.json: Full annotations in JSON formattest.faa.jsonl: Full annotations in JSON Lines format (one line for each input sequence)test.faa.tsv: Tabular summary of matches (TSV format)test.faa.xml: Full annotations in XML format
The JSON, JSON Lines, and XML outputs are more comprehensive, the TSV is a concise summary, and the GFF3 is a standard format suitable for genome browsers and annotation pipelines.
To annotate your own sequences FASTA file, omit the test profile and specify --input:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
--datadir data \
--input /path/to/sequences.faaFor nucleotide sequences, add --nucleic:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
--datadir data \
--input /path/to/sequences.fna \
--nucleicBy default, only non-ML/AI analyses are enabled. DeepTMHMM, TMbed, and SignalP 6 are not executed unless explicitly requested. TMbed is bundled; DeepTMHMM and SignalP 6 require separate installation due to licensing.
Specific analyses can be selected using --applications. Example: run Pfam and MobiDB-lite only:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
--datadir data \
--input /path/to/sequences.faa \
--applications Pfam,MobiDB-liteTip
Analysis names are case-insensitive, and hyphens and underscores are ignored: MobiDB-lite, mobidblite, and MOBIDB_LITE are all valid.
Note
Refer to the Available analyses section for descriptions and licensing details.
Specific analyses can be excluded with --skip-applications, e.g.:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
--datadir data \
--input /path/to/sequences.faa \
--skip-applications CDD,NCBIFAM,SUPERFAMILYAI/ML analyses are disabled by default because they are substantially more computationally expensive than traditional profile-based analyses.
Individual analyses may be enabled with --applications, or all ML-capable analyses may be enabled with --run-ml. AI/ML analyses run on CPU unless GPU execution is requested using --use-gpu:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
--datadir data \
--input /path/to/sequences.faa \
--run-ml \
--use-gpuOmit --use-gpu to run on CPU.
Add --goterms and --pathways to include Gene Ontology terms and pathway annotations in the output files:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
--datadir data \
--input /path/to/sequences.faa \
--goterms \
--pathwaysTo run InterProScan on your institute's Slurm cluster, use the slurm profile. This ensures that each task in the pipeline is submitted as a job to the Slurm scheduler.
Most HPC systems do not support Docker, but they often support Singularity or Apptainer for containerized execution. Include the appropriate profile (singularity or apptainer).
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile singularity,slurm \
--datadir data \
--input /path/to/sequences.faaImportant
The directory specified by --datadir must be accessible from all cluster nodes. This usually means it should be located on a shared network file system (e.g. NFS or Lustre).
| Name | Description | Included by default |
|---|---|---|
| AntiFam | Identifies sequences likely to be spurious or misannotated | ✅ Yes |
| CATH-Gene3D | Identifies structural domains from the CATH classification | ✅ Yes |
| CATH-FunFam | Groups protein domains into functional families based on CATH | ✅ Yes |
| CDD | Detects conserved domains using position-specific scoring matrices from NCBI | ✅ Yes |
| COILS | Predicts coiled-coil regions based on sequence patterns | ✅ Yes |
| DeepTMHMM | Predicts transmembrane helices | ❌ No |
| HAMAP | Identifies high-confidence protein families in microbial and organellar proteomes | ✅Yes |
| MobiDB-lite | Predicts intrinsically disordered regions | ✅ Yes |
| NCBIFAM | Matches proteins to curated HMMs from NCBI, including TIGRFAMs | ✅ Yes |
| PANTHER | Classifies proteins into families and subfamilies with curated GO terms | ✅ Yes |
| Pfam | Detects protein domains and families using HMMs built from multiple sequence alignments | ✅ Yes |
| Phobius | Predicts transmembrane topology and signal peptides | ❌ No |
| PIRSF | Classifies proteins into evolutionary families based on full-length sequence similarity | ✅ Yes |
| PIRSR | Identifies conserved residues using manually curated site rules | ✅ Yes |
| PRINTS | Detects protein families using groups of conserved motifs | ✅ Yes |
| PROSITE-patterns | Identifies protein features based on short sequence motifs | ✅ Yes |
| PROSITE-profiles | Detects protein families and domains using position-specific scoring profiles | ✅ Yes |
| SFLD | Classifies enzymes by relating sequence features to chemical function | ✅ Yes |
| SMART | Identifies signaling and extracellular domains | ✅ Yes |
| SUPERFAMILY | Assigns structural domains using HMMs based on the SCOP superfamily classification. | ✅ Yes |
| SignalP-Euk | Predicts signal peptides in eukaryotic proteins | ❌ No |
| SignalP-Prok | Predicts signal peptides in prokaryotic proteins | ❌ No |
| TMbed | Predicts transmembrane helices, transmembrane strands, and signal peptides | ✅ Yes |
DeepTMHMM, Phobius and SignalP contain licensed components and are disabled by default.
Tip
You do not need to install all three. Only download and configure the tool(s) you intend to use (e.g. just SignalP or Phobius).
To enable and execute any of these analyses:
- Obtain a license for the tool.
- Download and extract the archive.
- Set the full path to the extracted directory in a Nextflow config file.
Request a standalone copy of DeepTMHMM 1.0 by sending an email to licensing@biolib.com. After receiving the package, extract it:
unzip -q DeepTMHMM-v1.0.zipThen get the full path to the extracted directory:
echo "${PWD}/DeepTMHMMImportant
Phobius does not support certain non-standard or ambiguous residues. Any sequence containing pyrrolysine (one-letter code O), Asx (Asp/Asn ambiguity, B), Glx (Glu/Gln ambiguity, Z) or Xle (Leu/Ile ambiguity, J) will be skipped by Phobius but will continue to be processed normally by all other applications.
Download a copy of Phobius 1.01 from Erik Sonnhammer's website, then extract:
tar -zxf phobius101_linux.tgzAnd get the full path:
echo "${PWD}/phobius"SignalP 6.0 provides two model variants:
- Full (slow) model
- Distilled (fast) model, recommended for most users
A license is required to download either model. Licenses and model archives are available from the DTU website.
Extract the archive:
tar -zxf signalp-6.0i.fast.tar.gzThen get the full path:
echo "${PWD}/signal6p_fast"You must define the tool path(s) in a Nextflow config file, such as licensed.conf.
If you only want to run Phobius:
params {
appsConfig {
phobius {
dir = "/full/path/to/phobius"
}
}
}nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
-c licensed.conf \
--input /path/to/sequences.faa \
--applications phobiusTo configure multiple licensed tools in one file:
params {
appsConfig {
deeptmhmm {
dir = "/full/path/to/DeepTMHMM"
}
phobius {
dir = "/full/path/to/phobius"
}
signalp_euk {
dir = "/full/path/to/signal6p_fast"
}
signalp_prok {
dir = "/full/path/to/signal6p_fast"
}
}
}And run with:
nextflow run ebi-pf-team/interproscan6 \
-r 6.0.0 \
-profile docker \
-c licensed.conf \
--input /path/to/sequences.faayour.fasta \
--applications deeptmhmm,phobius,signalp_euk,signalp_prok \
--use-gpuNote
Running both signalp_euk and signalp_prok will execute SignalP twice, once with eukaryotic post-processing and once without. Choose the mode best suited to your dataset.
For instructions on integrating InterProScan 6 into a Nextflow pipeline as a Git submodule, see the integration documentation.
Our full documentation is available on ReadTheDocs.
For further assistance, please create an issue or contact us.
If you use InterPro in your work, please cite the following publication:
Blum M, Andreeva A, Florentino LC, Chuguransky SR, Grego T, Hobbs E, Pinto BL, Orr A, Paysan-Lafosse T, Ponamareva I, Salazar GA, Bordin N, Bork P, Bridge A, Colwell L, Gough J, Haft DH, Letunic I, Llinares-López F, Marchler-Bauer A, Meng-Papaxanthos L, Mi H, Natale DA, Orengo CA, Pandurangan AP, Piovesan D, Rivoire C, Sigrist CJA, Thanki N, Thibaud-Nissen F, Thomas PD, Tosatto SCE, Wu CH, Bateman A. InterPro: the protein sequence classification resource in 2025. Nucleic Acids Res. 2025 Jan;53(D1):D444-D456. doi: 10.1093/nar/gkae1082.