metatranscriptomic growth classifier
# Download tool
git clone git@github.com:SushiLab/mTRAc.git
# create environment
cd mTRAc
conda env create --name mTRAc --file=conda.yaml
conda activate mTRAc
$ Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc <command> [options]
-- Database
extract Extract marker genes from a genome
and create index
combine Merge individual databases into one
comprehensive database and create
index
index Index a database
-- Quantification
align Align reads against a marker gene
database and quantify gene abundances
merge Merge multiple marker gene abundance files
into a single file
-- Prediction
predict Predict the growth state (Growth/No growth)
for each genome within each sample
Extract 129 MGs from a genome
$ python mtrac.py extract
Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc extract [options]
Input options:
-f STR Input genome file. Can be gzipped
-db STR Name of Database - new database will
be stored in default database folder
Extracting 129 MGs from Arectalis
$ python mtrac.py extract -f databases/Arectalis/Arectalis.fasta.gz -db Arectalis_extraction_test
2025-11-04,08:18:57 INFO: mTRAc tool starting
2025-11-04,08:18:57 INFO: Calling genes
2025-11-04,08:18:59 INFO: Extracting markergenes
2025-11-04,08:19:03 INFO: Starting marker gene extraction from 1 protein files.
2025-11-04,08:19:04 INFO: Finished marker gene extraction.
2025-11-04,08:19:04 INFO: Found 128 / 129 markergenes
2025-11-04,08:19:04 INFO: Writing new database to mTRAc/databases
Combine existing databases into one database (e.g. Arectalis, Btheta and Ecoli in to EAM).
Note see the Arectalis_extraction_test database that was genereated in the extract step and is now usable in for downstream analysis:
$ python mtrac.py combine
2025-11-04,08:21:34 INFO: mTRAc tool starting
Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc combine [options]
Input options:
-i STR [STR ...] Names of databases that should be
combined. Choices:
- EAM
- Arectalis_extraction_test
- Ecoli
- Arectalis
- Btheta
-db STR Name of Database - new database will
be stored in default database folder.
Combine the Ecoli and Arectalis databases which creates the Ecoli_Arectalis database:
python mtrac.py combine -i Ecoli Arectalis -db Ecoli_Arectalis
2025-11-04,08:24:35 INFO: mTRAc tool starting
2025-11-04,08:24:35 INFO: Writing mapping file: mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.map.gz
2025-11-04,08:24:35 INFO: Writing gff3 file: mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.gff3.gz
2025-11-04,08:24:35 INFO: Writing fasta file: /mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.fasta.gz
Indexes an existing database - Needed only for downloaded databases.
$ python mtrac.py index
Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc index [options]
Input options:
-db STR genome database to index. Choices:
- EAM
- Ecoli
- Arectalis
- Btheta
python mtrac.py index -db Arectalis
2025-11-22,17:41:32 INFO: mTRAc tool starting
2025-11-22,17:41:32 INFO: Database Arectalis exists but is not built yet. Start building:
[bwa_index] Pack FASTA... 0.01 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.40 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.12 sec
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa index mTRAc/databases/Arectalis/Arectalis.fasta
[main] Real time: 0.556 sec; CPU: 0.562 sec
2025-11-22,17:41:33 INFO: mTRAc tool shutting down with exitcode 0
Align short read sequencing data against a marker gene database and quantify their abundances using FeatureCounts.
Note: See the 2 databases created in the extract and combine sections:
$ python mtrac.py align
Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc align [options]
Input options:
-f FILE[ FILE] input file(s) for reads in forward orientation, fastq(.gz)-formatted
-r FILE[ FILE] input file(s) for reads in reverse orientation, fastq(.gz)-formatted
-db STR genome database to use. Choices:
- EAM
- Arectalis_extraction_test
- Ecoli_Arectalis
- Ecoli
- Arectalis
- Btheta
Output options:
-o FILE output file prefix. Will create 3 files:
- prefix.bam
- prefix.fcnt
- prefix.fcnt.mgs
Algorithm options:
-t INT number of threads [1]
Example: Quantifying 129 MGs from of Arectalis using the dataset STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG.
python mTRAc/mtrac.py align -f STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.1.fq.gz -r STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.2.fq.gz -db Arectalis -o STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac -t 4
2025-11-04,08:41:25 INFO: mTRAc tool starting
2025-11-04,08:41:25 INFO: Database Arectalis exists but is not built yet. Start building:
[bwa_index] Pack FASTA... 0.03 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.39 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.12 sec
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa index mTRAc/databases/Arectalis/Arectalis.fasta
[main] Real time: 0.559 sec; CPU: 0.569 sec
2025-11-04,08:41:26 INFO: Start align command
2025-11-04,08:41:26 INFO: Start alignment
2025-11-04,08:41:26 INFO: Executing: bwa mem -a -t 4 mTRAc/databases/Arectalis/Arectalis.fasta STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.1.fq.gz STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.2.fq.gz
2025-11-04,08:42:27 INFO: Finished alignment. Start sorting.
2025-11-04,08:42:52 INFO: Finished sorting. Start featureCounts.
2025-11-04,08:42:52 INFO: Executing: featureCounts -O -M --fraction -t gene -a mTRAc/databases/Arectalis/Arectalis.gff3 -o STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.fcnt -F GTF -g locus_tag -p -B --verbose STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.bam --countReadPairs -T 4
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.1.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
|| ||
|| Output file : STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
|| Summary : STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
|| Paired-end : yes ||
|| Count read pairs : yes ||
|| Annotation : Arectalis.gff3 (GTF) ||
|| Dir for temp files : ./ ||
|| ||
|| Threads : 4 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted (fractional) ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Arectalis.gff3 ... ||
|| Features : 3283 ||
|| Meta-features : 3283 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.ts ... ||
|| Paired-end reads are included. ||
|| Total alignments : 4619528 ||
|| Successfully assigned alignments : 4428247 (95.9%) ||
|| Running time : 0.05 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "STAU23-2_Ere_Glu_5_9_37 ||
|| _Exp_3_ISOG_subsample.mtrac.fcnt.summary" ||
|| ||
\\============================================================================//
2025-11-04,08:42:55 INFO: Finished featureCounts. Start marker gene extraction.
2025-11-04,08:42:55 INFO: Finished marker gene extraction.
2025-11-04,08:42:55 INFO: Output file: STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.mgs
2025-11-04,08:42:55 INFO: Finished align command
The align command will produce three files of which the *mgs file is the primary output:
head -n 10 STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.mgs
#GENOME ANNOTATION GENE CHROMOSOME LENGTH COUNT
Arectalis TIGR00362 MIO91_00005 CP092643.1 1362 2111
Arectalis TIGR00663 MIO91_00010 CP092643.1 1113 1732.50
Arectalis TIGR01059 MIO91_00025 CP092643.1 1938 3078.83
Arectalis TIGR01063 MIO91_00030 CP092643.1 2508 4110.33
Arectalis TIGR01146 MIO91_00570 CP092643.1 861 1622.50
Arectalis TIGR01039 MIO91_00575 CP092643.1 1392 2406.83
Arectalis TIGR00755 MIO91_01105 CP092643.1 867 1555.67
Arectalis TIGR00420 MIO91_01410 CP092643.1 1089 1769.50
Arectalis TIGR02397 MIO91_01495 CP092643.1 1572 2724.00
The merge command is used to combine multiple marker gene abundance files (.mgs), generated by the align command from different samples, into a single, comprehensive file. This merged file is the required input for the predict command.
$ python mtrac.py merge
Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc merge [options]
Input options:
-f FILE[ FILE] input files for merging. Have to be at least 2 output files
([prefix].mgs) of predict step from two different runs against
the same database
Output options:
-o FILE output file
Example
$ python mtrac.py merge -f sample1.mgs sample2.mgs sample3.mgs -o merged_counts.tsv
Output Format Example:
#GENOME ANNOTATION GENE CHROMOSOME LENGTH sample1 sample2 sample3
Arectalis COG0012 MJ392_00005 CP092639.1 1404 1028 1502 987
Arectalis COG0016 MJ392_00006 CP092639.1 1000 500 800 1200
...
The predict command uses the merged marker gene count data to predict the growth state (Growth/No growth) for each genome within each sample, based on a pre-trained machine learning model.
$ python mtrac.py predict
Program: mTRAc - metatranscriptomic growth classifier
Version: 0.0.5
mTRAc predict [options]
Input options:
-i FILE Input CSV file with count(s) produced by extract/merge functions.
Output options:
-o STR Output prefix
Use the merged file from the previous example (merged_counts.tsv) and the default TPM normalisation method.
python mtrac.py predict -i merged_counts.tsv -o growth_predictions
Output File Format:
Sample Probability estimate Classification
sample1 0.85 Growth
sample2 0.12 No Growth
sample3 0.55 Growth