Skip to content

zoey-rw/microbialForecasts

Repository files navigation

Microbial Forecasts: Spatio-temporal Prediction of Soil Microbiomes

R License

Overview

This project uses state-space hierarchical Bayesian models to explore spatio-temporal variation and predictability across soil microbiomes of the United States. The models forecast relative abundance patterns for both taxonomic and functional groups using environmental and temporal predictors.

  • Relative abundance forecasting for taxonomic groups
  • Relative abundance forecasting for functional groups
  • Three different linear model structures
  • Separate models for fungi (ITS sequences) and bacteria (16S sequences)
  • Plot-level analysis (20m × 20m) with soil core replicates
  • ~2000 soil cores from 18 NEON sites across the United States
  • Multiple sampling periods per year per site

Table of Contents

Installation

Required R Packages

The project uses the following core packages:

# Core dependencies
install.packages(c(
  "tidyverse", "here", "nimble", "coda", "lubridate", 
  "reshape2", "dplyr", "pacman", "plyr", "tibble",
  "doParallel", "data.table", "Rfast", "moments",
  "scoringRules", "Metrics", "ggpubr"
))

Package Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/microbialForecasts.git
    cd microbialForecasts
  2. Install the microbialForecast package:

    # From source
    install.packages("microbialForecast", repos = NULL, type = "source")
  3. Load the environment:

    source("source.R")

Basic Setup

# Load the environment and packages
source("source.R")

# Use 'here' for relative file paths
library(here)

Running Analysis

# Example: Prepare taxonomic data for modeling
rank_data <- quick_get_rank_df(k = 1, 
                               min.date = "20151101", 
                               max.date = "20200101")

# Fit a model (example)

Project Structure

microbialForecasts/
├── analysis/                    # Main analysis scripts
│   ├── model_analysis/         # Model fitting and analysis (scripts 0-11)
│   ├── create_figs/           # Figure generation scripts
├── data_construction/          # Data preparation scripts
│   ├── covariate_prep/        # Environmental covariate processing
│   └── microbe/               # Microbial data processing
├── microbialForecast/         # R package source code
├── data/                      # Data directory (see .gitignore)
│   ├── clean/                 # Processed model inputs
│   ├── model_outputs/         # Model results
│   └── summary/               # Summary statistics
├── figures/                   # Output figures
└── shinyapp/                  # Interactive visualization app

Key Directories

  • analysis/: Core analysis scripts for fitting and evaluating Bayesian state-space models

    • Scripts 0-4: Model creation and output processing
    • Scripts 5-8: Forecast creation and evaluation
  • data_construction/: Scripts for downloading and preparing data from NEON and other sources

  • microbialForecast/: R package containing functions for model fitting and evaluation

Data Sources

  • Soil microbiome data: NEON (National Ecological Observatory Network)
  • Environmental covariates:
    • Soil temperature and moisture (NEON sensors, DAYMET, SMOS)
    • Plant diversity and LAI (MODIS, NEON plant sampling)
    • Soil chemistry (NEON soil characterization)

Model Types

1. Taxonomic Models

  • Target: Relative abundance of taxonomic groups
  • Approach: Hierarchical Bayesian state-space models
  • Variants: Dirichlet and Beta regression formulations

2. Functional Group Models

  • Target: Relative abundance of functional categories
  • Categories: Based on literature review, genomic pathways, and experimental enrichment
  • Kingdoms: Separate bacterial and fungal functional classifications

Model Structures

  • Environmental predictors: Soil conditions, plant diversity, climate
  • Seasonality: Cyclical temporal components
  • Environmental + Seasonality: Combined model structure

Usage Examples

Fitting a Single Taxon Model

# Load required functions
source("source.R")

# Prepare data for a specific taxonomic rank
model_data <- prepTaxonomicData(rank.df = your_data, 
                                min.prev = 3,
                                min.date = "2015-11-01",
                                max.date = "2020-01-01")

Creating Forecasts

# Generate hindcast predictions
# See analysis/model_analysis/06_createHindcasts.r for examples

Evaluating Model Performance

# Calculate scoring metrics  
# See analysis/model_analysis/08_calculateScoringMetrics.r

Model Workflow

  1. Data Preparation (data_construction/)

    • Download and clean NEON data
    • Process environmental covariates
    • Prepare microbial abundance matrices
  2. Model Fitting (analysis/workflows/)

    • Fit Bayesian models using NIMBLE
    • Assess convergence
    • Combine MCMC chains
  3. Forecasting (analysis/model_analysis/)

    • Generate hindcast predictions
    • Calculate forecast horizons
    • Evaluate forecast accuracy
  4. Analysis and Visualization (analysis/create_figs/)

    • Create publication figures
    • Analyze model performance
    • Explore spatio-temporal patterns

Functions

The microbialForecast package includes:

  • prepTaxonomicData(): Prepare taxonomic abundance data for modeling
  • prepFunctionalData(): Prepare functional group data
  • prepDiversityData(): Prepare diversity data
  • run_MCMC_*(): Functions for running MCMC sampling
  • summarize_*_model(): Model summary functions
  • add_scoring_metrics(): Forecast evaluation metrics

Computing Requirements

  • Memory: 8GB+ RAM recommended for full model fitting
  • Storage: ~50GB for complete data and model outputs
  • Compute: Models designed for HPC clusters but can run locally
  • Time: Individual model fits: ~15 minutes to hours depending on complexity

Citation

If you use this code or approach in your research, please cite:

Werbin et al. 2024 preprint [link]

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Zoey Werbin
Email: zoeywerbin@gmail.com

Acknowledgments

  • NEON (National Ecological Observatory Network) for providing soil microbiome and environmental data
  • NIMBLE development team for Bayesian modeling framework
  • Very patient mentors and collaborators

About

Creating and validating forecasts of microbial abundances at NEON sites.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published