This project uses state-space hierarchical Bayesian models to explore spatio-temporal variation and predictability across soil microbiomes of the United States. The models forecast relative abundance patterns for both taxonomic and functional groups using environmental and temporal predictors.
- Relative abundance forecasting for taxonomic groups
- Relative abundance forecasting for functional groups
- Three different linear model structures
- Separate models for fungi (ITS sequences) and bacteria (16S sequences)
- Plot-level analysis (20m × 20m) with soil core replicates
- ~2000 soil cores from 18 NEON sites across the United States
- Multiple sampling periods per year per site
The project uses the following core packages:
# Core dependencies
install.packages(c(
"tidyverse", "here", "nimble", "coda", "lubridate",
"reshape2", "dplyr", "pacman", "plyr", "tibble",
"doParallel", "data.table", "Rfast", "moments",
"scoringRules", "Metrics", "ggpubr"
))-
Clone the repository:
git clone https://github.com/yourusername/microbialForecasts.git cd microbialForecasts -
Install the microbialForecast package:
# From source install.packages("microbialForecast", repos = NULL, type = "source")
-
Load the environment:
source("source.R")
# Load the environment and packages
source("source.R")
# Use 'here' for relative file paths
library(here)# Example: Prepare taxonomic data for modeling
rank_data <- quick_get_rank_df(k = 1,
min.date = "20151101",
max.date = "20200101")
# Fit a model (example)microbialForecasts/
├── analysis/ # Main analysis scripts
│ ├── model_analysis/ # Model fitting and analysis (scripts 0-11)
│ ├── create_figs/ # Figure generation scripts
├── data_construction/ # Data preparation scripts
│ ├── covariate_prep/ # Environmental covariate processing
│ └── microbe/ # Microbial data processing
├── microbialForecast/ # R package source code
├── data/ # Data directory (see .gitignore)
│ ├── clean/ # Processed model inputs
│ ├── model_outputs/ # Model results
│ └── summary/ # Summary statistics
├── figures/ # Output figures
└── shinyapp/ # Interactive visualization app
-
analysis/: Core analysis scripts for fitting and evaluating Bayesian state-space models- Scripts 0-4: Model creation and output processing
- Scripts 5-8: Forecast creation and evaluation
-
data_construction/: Scripts for downloading and preparing data from NEON and other sources -
microbialForecast/: R package containing functions for model fitting and evaluation
- Soil microbiome data: NEON (National Ecological Observatory Network)
- Environmental covariates:
- Soil temperature and moisture (NEON sensors, DAYMET, SMOS)
- Plant diversity and LAI (MODIS, NEON plant sampling)
- Soil chemistry (NEON soil characterization)
- Target: Relative abundance of taxonomic groups
- Approach: Hierarchical Bayesian state-space models
- Variants: Dirichlet and Beta regression formulations
- Target: Relative abundance of functional categories
- Categories: Based on literature review, genomic pathways, and experimental enrichment
- Kingdoms: Separate bacterial and fungal functional classifications
- Environmental predictors: Soil conditions, plant diversity, climate
- Seasonality: Cyclical temporal components
- Environmental + Seasonality: Combined model structure
# Load required functions
source("source.R")
# Prepare data for a specific taxonomic rank
model_data <- prepTaxonomicData(rank.df = your_data,
min.prev = 3,
min.date = "2015-11-01",
max.date = "2020-01-01")
# Generate hindcast predictions
# See analysis/model_analysis/06_createHindcasts.r for examples# Calculate scoring metrics
# See analysis/model_analysis/08_calculateScoringMetrics.r-
Data Preparation (
data_construction/)- Download and clean NEON data
- Process environmental covariates
- Prepare microbial abundance matrices
-
Model Fitting (
analysis/workflows/)- Fit Bayesian models using NIMBLE
- Assess convergence
- Combine MCMC chains
-
Forecasting (
analysis/model_analysis/)- Generate hindcast predictions
- Calculate forecast horizons
- Evaluate forecast accuracy
-
Analysis and Visualization (
analysis/create_figs/)- Create publication figures
- Analyze model performance
- Explore spatio-temporal patterns
The microbialForecast package includes:
prepTaxonomicData(): Prepare taxonomic abundance data for modelingprepFunctionalData(): Prepare functional group dataprepDiversityData(): Prepare diversity datarun_MCMC_*(): Functions for running MCMC samplingsummarize_*_model(): Model summary functionsadd_scoring_metrics(): Forecast evaluation metrics
- Memory: 8GB+ RAM recommended for full model fitting
- Storage: ~50GB for complete data and model outputs
- Compute: Models designed for HPC clusters but can run locally
- Time: Individual model fits: ~15 minutes to hours depending on complexity
If you use this code or approach in your research, please cite:
Werbin et al. 2024 preprint [link]
This project is licensed under the MIT License - see the LICENSE file for details.
Zoey Werbin
Email: zoeywerbin@gmail.com
- NEON (National Ecological Observatory Network) for providing soil microbiome and environmental data
- NIMBLE development team for Bayesian modeling framework
- Very patient mentors and collaborators