Skip to content

SushiLab/mTRAc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mTRAc

metatranscriptomic growth classifier

Installation:

# Download tool
git clone git@github.com:SushiLab/mTRAc.git
# create environment
cd mTRAc
conda env create --name mTRAc --file=conda.yaml
conda activate mTRAc

Tool Interface


$ Program: mTRAc - metatranscriptomic growth classifier
    Version: 0.0.5


mTRAc <command> [options]

   	 -- Database
    
          extract   Extract marker genes from a genome
                    and create index

          combine   Merge individual databases into one
                    comprehensive database and create
                    index

          index     Index a database
                    
   	 -- Quantification
    
          align     Align reads against a marker gene 
                    database and quantify gene abundances

          merge     Merge multiple marker gene abundance files
                    into a single file

   	 -- Prediction

          predict    Predict the growth state (Growth/No growth)
                     for each genome within each sample





Extract

Extract 129 MGs from a genome


$ python mtrac.py extract


Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5


        mTRAc extract [options]

        Input options:
           -f   STR          Input genome file. Can be gzipped

           -db  STR          Name of Database - new database will
                             be stored in default database folder

Extract Example:

Extracting 129 MGs from Arectalis


$ python mtrac.py extract -f databases/Arectalis/Arectalis.fasta.gz -db Arectalis_extraction_test
2025-11-04,08:18:57 INFO: mTRAc tool starting
2025-11-04,08:18:57 INFO: Calling genes
2025-11-04,08:18:59 INFO: Extracting markergenes
2025-11-04,08:19:03 INFO: Starting marker gene extraction from 1 protein files.
2025-11-04,08:19:04 INFO: Finished marker gene extraction.
2025-11-04,08:19:04 INFO: Found 128 / 129 markergenes
2025-11-04,08:19:04 INFO: Writing new database to mTRAc/databases

Combine

Combine existing databases into one database (e.g. Arectalis, Btheta and Ecoli in to EAM).

Note see the Arectalis_extraction_test database that was genereated in the extract step and is now usable in for downstream analysis:

$ python mtrac.py combine
2025-11-04,08:21:34 INFO: mTRAc tool starting
Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5

        mTRAc combine [options]

        Input options:
           -i   STR [STR ...]   Names of databases that should be
                                combined. Choices:
				 - EAM
				 - Arectalis_extraction_test
				 - Ecoli
				 - Arectalis
				 - Btheta

           -db  STR             Name of Database - new database will
                                be stored in default database folder.

Combine Example:

Combine the Ecoli and Arectalis databases which creates the Ecoli_Arectalis database:

python mtrac.py combine -i Ecoli Arectalis -db Ecoli_Arectalis
2025-11-04,08:24:35 INFO: mTRAc tool starting
2025-11-04,08:24:35 INFO: Writing mapping file: mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.map.gz
2025-11-04,08:24:35 INFO: Writing gff3 file: mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.gff3.gz
2025-11-04,08:24:35 INFO: Writing fasta file: /mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.fasta.gz

Index

Indexes an existing database - Needed only for downloaded databases.

$ python mtrac.py index

Program: mTRAc - metatranscriptomic growth classifier
    Version: 0.0.5

    mTRAc index [options]

    Input options:
        -db  STR          genome database to index. Choices:
    			 - EAM
                 - Ecoli
                 - Arectalis
                 - Btheta


Index Example:

python mtrac.py index -db Arectalis
2025-11-22,17:41:32 INFO: mTRAc tool starting
2025-11-22,17:41:32 INFO: Database Arectalis exists but is not built yet. Start building:
[bwa_index] Pack FASTA... 0.01 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.40 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.12 sec
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa index mTRAc/databases/Arectalis/Arectalis.fasta
[main] Real time: 0.556 sec; CPU: 0.562 sec
2025-11-22,17:41:33 INFO: mTRAc tool shutting down with exitcode 0

Align

Align short read sequencing data against a marker gene database and quantify their abundances using FeatureCounts.

Note: See the 2 databases created in the extract and combine sections:

$ python mtrac.py align

Program: mTRAc - metatranscriptomic growth classifier
    Version: 0.0.5


    mTRAc align [options]

    Input options:
       -f   FILE[ FILE]  input file(s) for reads in forward orientation, fastq(.gz)-formatted

       -r   FILE[ FILE]  input file(s) for reads in reverse orientation, fastq(.gz)-formatted

       -db  STR          genome database to use. Choices:
    			- EAM
			 	- Arectalis_extraction_test
			 	- Ecoli_Arectalis
			 	- Ecoli
			 	- Arectalis
			 	- Btheta

    Output options:
       -o   FILE         output file prefix. Will create 3 files:
                            - prefix.bam
                            - prefix.fcnt
                            - prefix.fcnt.mgs

    Algorithm options:
       -t   INT          number of threads [1]

Example: Quantifying 129 MGs from of Arectalis using the dataset STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG.



python mTRAc/mtrac.py  align -f STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.1.fq.gz -r STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.2.fq.gz -db Arectalis -o STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac -t 4
2025-11-04,08:41:25 INFO: mTRAc tool starting
2025-11-04,08:41:25 INFO: Database Arectalis exists but is not built yet. Start building:
[bwa_index] Pack FASTA... 0.03 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.39 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.12 sec
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa index mTRAc/databases/Arectalis/Arectalis.fasta
[main] Real time: 0.559 sec; CPU: 0.569 sec
2025-11-04,08:41:26 INFO: Start align command
2025-11-04,08:41:26 INFO: 	Start alignment
2025-11-04,08:41:26 INFO: 		Executing: bwa mem -a -t 4 mTRAc/databases/Arectalis/Arectalis.fasta STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.1.fq.gz STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.2.fq.gz
2025-11-04,08:42:27 INFO: 	Finished alignment. Start sorting.
2025-11-04,08:42:52 INFO: 	Finished sorting. Start featureCounts.
2025-11-04,08:42:52 INFO: Executing: featureCounts -O -M --fraction -t gene -a mTRAc/databases/Arectalis/Arectalis.gff3 -o STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.fcnt -F GTF -g locus_tag -p -B --verbose STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.bam --countReadPairs -T 4

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
	  v2.1.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
||                                                                            ||
||             Output file : STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
||                 Summary : STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
||              Paired-end : yes                                              ||
||        Count read pairs : yes                                              ||
||              Annotation : Arectalis.gff3 (GTF)                             ||
||      Dir for temp files : ./                                               ||
||                                                                            ||
||                 Threads : 4                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted (fractional)                             ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Arectalis.gff3 ...                                    ||
||    Features : 3283                                                         ||
||    Meta-features : 3283                                                    ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.ts ... ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 4619528                                              ||
||    Successfully assigned alignments : 4428247 (95.9%)                      ||
||    Running time : 0.05 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "STAU23-2_Ere_Glu_5_9_37  ||
|| _Exp_3_ISOG_subsample.mtrac.fcnt.summary"                              ||
||                                                                            ||
\\============================================================================//

2025-11-04,08:42:55 INFO: 	Finished featureCounts. Start marker gene extraction.
2025-11-04,08:42:55 INFO: 	Finished marker gene extraction.
2025-11-04,08:42:55 INFO: Output file: STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.mgs
2025-11-04,08:42:55 INFO: Finished align command

The align command will produce three files of which the *mgs file is the primary output:


head -n 10 STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.mgs
#GENOME	ANNOTATION	GENE	CHROMOSOME	LENGTH	COUNT
Arectalis	TIGR00362	MIO91_00005	CP092643.1	1362	2111
Arectalis	TIGR00663	MIO91_00010	CP092643.1	1113	1732.50
Arectalis	TIGR01059	MIO91_00025	CP092643.1	1938	3078.83
Arectalis	TIGR01063	MIO91_00030	CP092643.1	2508	4110.33
Arectalis	TIGR01146	MIO91_00570	CP092643.1	861	    1622.50
Arectalis	TIGR01039	MIO91_00575	CP092643.1	1392	2406.83
Arectalis	TIGR00755	MIO91_01105	CP092643.1	867	    1555.67
Arectalis	TIGR00420	MIO91_01410	CP092643.1	1089	1769.50
Arectalis	TIGR02397	MIO91_01495	CP092643.1	1572	2724.00

Merge

The merge command is used to combine multiple marker gene abundance files (.mgs), generated by the align command from different samples, into a single, comprehensive file. This merged file is the required input for the predict command.


$ python mtrac.py merge

Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5


        mTRAc merge [options]

        Input options:
           -f   FILE[ FILE]  input files for merging. Have to be at least 2 output files
                             ([prefix].mgs) of predict step from two different runs against
                             the same database

        Output options:
           -o   FILE         output file

Example

$ python mtrac.py merge -f sample1.mgs sample2.mgs sample3.mgs -o merged_counts.tsv

Output Format Example:

#GENOME	ANNOTATION	GENE	CHROMOSOME	LENGTH	sample1	sample2	sample3
Arectalis	COG0012	MJ392_00005	CP092639.1	1404	1028	1502	 987
Arectalis	COG0016	MJ392_00006	CP092639.1	1000	 500	 800	1200
...

Predict

The predict command uses the merged marker gene count data to predict the growth state (Growth/No growth) for each genome within each sample, based on a pre-trained machine learning model.

Usage

$ python mtrac.py predict

Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5


        mTRAc predict [options]

        Input options:
           -i   FILE        Input CSV file with count(s) produced by extract/merge functions.

        Output options:
           -o  STR          Output prefix

Example

Use the merged file from the previous example (merged_counts.tsv) and the default TPM normalisation method.

python mtrac.py predict -i merged_counts.tsv  -o growth_predictions

Output File Format:

 Sample	Probability estimate	Classification
sample1	                0.85	        Growth
sample2	                0.12	     No Growth
sample3	                0.55	        Growth

About

MetaTRAnscriptomic growth Classifier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages