Skip to content

sacrevert/msiAssess

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

msiAssess

msiAssess allows the user to fit a set of Bayesian models for multi-species (biodiversity) indicators to both simulated and empirical data. The models will be explained in detail elsewhere, but include a version of Freeman’s “generic method” (S. N. Freeman et al., 2021) that allows for variably observed species’ index standard errors, and a Bayesian version of Soldaat’s Monte Carlo method (L. L. Soldaat et al., 2017) with user-controlled in-model imputation of missing data. Two novel Bayesian model types (“no pooling” and “partial pooling” models of growth-rates with common annual species-level “shocks”) are also included for comparative purposes.

The simulation framework gives the user control over a range of parameters, and allows for the inclusion of missingness patterns that are “Missing Not At Random” (MNAR) as a test of how the different MSIs perform when their assumptions (e.g. species Missing At Random [MAR] conditional on the model structure) are violated to varying degrees, as is quite common when heterogeneous data sources are combined in MSIs, and/or when species time-series exhibit strong time-varying biases (e.g. R. J. Boyd et al. (2025)).

Basic usage

A wrapper function run_full_analysis() is provided for convenience, although the constituent parts can also be run separately. For simulated data, it is most convenient to specify the simulation settings via a list that is then passed to run_full_analysis().

## Specify simulation settings
sim_args <- list(n_species = 40, n_years = 30, seed = 232680,
                 ## Growth rate data-generating process (DGP)
                 dgp_mode = "spline", # "spline","rw","ar1","changepoint","seasonal","mixture"
                 ## DGP options (sub-options apply to specific DGPs as stated)
                 # spline options
                 df_mu = 6, # default
                 # RW / AR1
                 sigma_eta = 0.05, phi = 0.7,
                 # changepoint
                 n_cp = 2, jump_sd = 0.15, piecewise_linear = FALSE,
                 # seasonal
                 A = 0.15, period = 8, phi0 = runif(1, 0, 2*pi),
                 # mixture of random walks
                 K_guilds = 5, # number of groups
                 # time-series entry mode
                 entry_mode = c("none"),
                 ## Simulate species/observation options (cross-DGP)
                 # species/state variation
                 sigma_sp = 0.05, sigma_delta = 0.05, sd_alpha0 = 0.4,
                 innov_dist = "normal", df_u = 3, # "normal" or "student_t" (df_u for t only)
                 sp_trend = "none", # "none" or "random_slope"
                 sigma_gamma = 0.05, log_sd_se = 0.35, 
                 use_delta = FALSE, # cross-species annual shocks?
                 # observation model
                 sigma_obs_mean = 0.02, prop_missing = 0,
                 # Inclusion MNAR (additional to prop_missing)
                 inclusion_bias = list(enabled = FALSE, # include inclusion bias?
                                       a0 = -0.4, # parameters for Bernoulli model of MNAR bias
                                       a1 = 3.0,
                                       rho1 = 1.2,
                                       rho0 = 0.2,
                                       p_init = 1))

These will be described in more detail when the code is incorporated into an R package, although inspection of the generate_mu() and simulate_species_data() functions’ code should make it clear what each option is doing.

The other options passed to run_full_analysis() are briefly described in code comments below.

out <- run_full_analysis(data_source = "simulate", # simulated data or empirical?
                         sim_args = sim_args, # as above
                         # which models to run?
                         fit_models = c("partial","freeman", "nopool","bayes_geomean"),
                         ## Model-specific settings
                         jags_partial = list(df_mu=6, obs_var_model=2, n_iter=500, n_burn=100),
                         jags_nopool = list(df_mu=6, obs_var_model=2, n_iter=500, n_burn=100),
                         jags_freeman = list(obs_var_model=2, n_iter=500, n_burnin=100),
                         jags_bayes_geomean = list(obs_var_model=2, n_iter=500, n_burnin=100),
                         # smoothed version of Bayesian geomean?
                         smooth_geomean = list(enable = TRUE, prefer_freeman_basis = FALSE),
                         plot_geomean = TRUE, # plot unsmoothed Bayes geomean MSI
                         # impute missing spp/year combinations
                         impute_all_geomean = T, # if false, then under MNAR this estimand differs from other models
                         plot_MNAR = TRUE, # plot sampled-only truth alongside *actual* truth
                         ## Cross-model settings
                         # legacy seFromData overridden by model-specific obs_var_model settings, seFromData = T (or F) sets obs_var_model = 2 (or 4
                         # but only if obs_var_model not specified
                         brc_opts = list(num_knots=12, seFromData=TRUE, Y1perfect=TRUE, m.scale = "loge"),
                         ## Growth-rate presentation scale (model returns both anyway, this is for plot)
                         growth_scale = "log",
                         quiet = TRUE) # suppress JAGS output here 

“Perfect” data scenario (no missingness)

In this example we are using a spline-based data-generating process (DGP) for the underlying growth rates. This is congenial to the actual models (i.e. the DGP assumed by the models is actually true). We are not including any MNAR missingness, or any missingness at all. In general we expect all models to index “truth” very closely in this scenario. Indeed, all models overlie the “truth” for all metrics with high confidence. (Note that these examples are all run with very short MCMC chains.)

# Check convergence (preview only)
head(out$checks$freeman$bad) # all models are included at out$checks level
NULL
# Indicator plots
print(out$plots$MSI) # M_prime equivalents

print(out$plots$CumLog) # M equivalents where applicable

print(out$plots$Growth) # annual growth on scale specified in run_full_analysis()

# Inclusion diagnostics (e.g. for comparing MNAR strength to empirical examples)
evaluate_inclusion_process(out$sim) # Not informative for perfect sample with no drop-out
$transition
$transition$p11
[1] 1

$transition$p00
[1] NA

$transition$pi
[1] 1

$transition$Ein
[1] Inf

$transition$Eout
[1] NA


$run_lengths_in
 [1] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
[26] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

$run_lengths_out
integer(0)

$burstiness_in
[1] -1

$burstiness_out
[1] NA

$entry_stats
$entry_stats$FY
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1

$entry_stats$LY
 [1] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
[26] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

$entry_stats$duration
 [1] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
[26] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

$entry_stats$prop_never_observed
[1] 0

$entry_stats$prop_enter_after_1
[1] 0

$entry_stats$mean_FY
[1] 1

$entry_stats$median_FY
[1] 1

$entry_stats$mean_duration
[1] 30

$entry_stats$median_duration
[1] 30


$trend_entry
NULL

$cor_I_r_by_year
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA

$diagnostics
   year n_pairs sd_I sd_r            reason
1     2      40    0    0 no_variation_both
2     3      40    0    0 no_variation_both
3     4      40    0    0 no_variation_both
4     5      40    0    0 no_variation_both
5     6      40    0    0 no_variation_both
6     7      40    0    0 no_variation_both
7     8      40    0    0 no_variation_both
8     9      40    0    0 no_variation_both
9    10      40    0    0 no_variation_both
10   11      40    0    0 no_variation_both
11   12      40    0    0 no_variation_both
12   13      40    0    0 no_variation_both
13   14      40    0    0 no_variation_both
14   15      40    0    0 no_variation_both
15   16      40    0    0 no_variation_both
16   17      40    0    0 no_variation_both
17   18      40    0    0 no_variation_both
18   19      40    0    0 no_variation_both
19   20      40    0    0 no_variation_both
20   21      40    0    0 no_variation_both
21   22      40    0    0 no_variation_both
22   23      40    0    0 no_variation_both
23   24      40    0    0 no_variation_both
24   25      40    0    0 no_variation_both
25   26      40    0    0 no_variation_both
26   27      40    0    0 no_variation_both
27   28      40    0    0 no_variation_both
28   29      40    0    0 no_variation_both
29   30      40    0    0 no_variation_both

$note
[1] "All annual selection–growth correlations are NA for T-1 = 29. This is expected when inclusion or growth lacks cross-species variation (e.g., cohort entry with all zeros early on). Reason summary: [no_variation_bothx29]. Source of r: mu_true + delta_true (+ gamma_s)."
## Spp trends (grey line = latent process; blue dot = noisy obs with missingness
# Plot 12 randomly chosen species on log scale
plot_species_trends(out, n_sample = 12, scale = "log")

# Plot specific species (by index) as baseline-1 indices
plot_species_trends(out, species = c(1, 5, 9), scale = "index")

“Imperfect” data scenario (MNAR missingness)

Note that here missing species/year index combinations are being imputed for the Bayesian geomean (impute_all_geomean = T). If we switched this to FALSE, then the estimand is no longer comparable to the other methods.

sim_args <- list(n_species = 100, n_years = 30, seed = 232680,
                 ## Growth rate DGP
                 dgp_mode = "mixture",
                 ## DGP options for mixture
                 # RW / AR1
                 sigma_eta = 0.05, phi = 0.7,
                 # mixture of random walks
                 K = 5, # number of groups
                 # time-series entry mode
                 entry_mode = c("none"),
                 ## Simulate species options
                 # species/state variation
                 sigma_sp = 0.05, sigma_delta = 0.05, sd_alpha0 = 0.4,
                 innov_dist = "normal", df_u = 3, # "normal" or "student_t" (df_u for t)
                 sp_trend = "random_slope", sigma_gamma = 0.05, log_sd_se = 0.35, # random slopes this time
                 use_delta = FALSE, # cross-species annual shocks?
                 # observation model
                 sigma_obs_mean = 0.05, prop_missing = 0.2,
                 # Inclusion MNAR (additional to prop_missing)
                 inclusion_bias = list(enabled = TRUE, # include inclusion bias?
                                       a0 = -0.5, 
                                       a1 = 3.0,
                                       rho1 = 1.5,
                                       rho0 = 0.1,
                                       p_init = 1))
out <- run_full_analysis(data_source = "simulate",
                         sim_args = sim_args,
                         fit_models = c("partial","freeman", "nopool", "bayes_geomean"),
                         ## Model-specific settings (ovm = 2)
                         jags_partial = list(df_mu=6, obs_var_model=4, n_iter=500, n_burn=100),
                         jags_nopool = list(df_mu=6, obs_var_model=4, n_iter=500, n_burn=100),
                         jags_freeman = list(obs_var_model=4, n_iter=500, n_burnin=100),
                         jags_bayes_geomean = list(obs_var_model=4, n_iter=500, n_burnin=100),
                         # smoothed version of Bayesian geomean?
                         smooth_geomean = list(enable = TRUE, prefer_freeman_basis = FALSE),
                         plot_geomean = TRUE, # plot unsmoothed Bayes geomean MSI
                         # impute missing spp/year combinations
                         impute_all_geomean = T, # if false, then under MNAR this estimand differs from other models
                         ## Cross-model settings (seFromData not really needed here as ovm = 2 above)
                         brc_opts = list(num_knots=12, seFromData=TRUE, Y1perfect=FALSE, m.scale = "loge"),
                         ## Growth-rate presentation scale
                         growth_scale = "log",
                         quiet = TRUE) # suppress JAGS output to console

# Indicator plots
print(out$plots$MSI) # M_prime equivalents

print(out$plots$CumLog) # M equivalents where applicable

print(out$plots$Growth) # annual growth on scale specified in run_full_analysis()

# Inclusion diagnostics (e.g. for comparing MNAR strength to empirical examples)
evaluate_inclusion_process(out$sim)
$transition
$transition$p11
[1] 0.59

$transition$p00
[1] 0.652

$transition$pi
[1] 0.463

$transition$Ein
[1] 2.44

$transition$Eout
[1] 2.87


$run_lengths_in
  [1] 11  1  3  1  3  1  1  4  1  1  3  1  1  2  1  2  3  4  2  1  7  2  3  1  2
 [26] 10  1  2  1  4  8  3  1  2  3  2  2  3  6  3  2  3  3  2  4  2  2  1  1  4
 [51]  1  1  3  2  1  2  1  6  2  1  1  2  1  1  3  1  1  2  2  3  1  1  1  1  2
 [76]  2  1  1  3  1  2  1  1  1  5  2  1  2  1  1  1  1  3  2  1  4  4  3  6  5
[101]  8  2  4  1  1  1  2  7  1  1  1  1  9  1  2  1  1  4  8  2  1  1  1  1  2
[126]  1  2  2  1  2  1  5  1  1  2  1  1  1  3  3  3  1  2  1  2  5  2  2  1  1
[151]  1  1  5  4  3  2  1  1  1  1  1  5  2  3  1 12  8  3  3  1  1 14  5  2  4
[176]  1  2  1  1  8  1  1  2  4  1  1  2  1  2  3  2  1  1  5  1  2  3  3  7  2
[201]  2  3  2  2  1  1  1  3  1  1  1  1  1  1  1  1  1  1  2  1  2  5  1  2  2
[226]  6  7  2  1  2  1  6  1  3  2  2  2  6  4  1  1  1  3  3  3  1  1  1  1  1
[251]  1  1  1  2  1  1  2  5  2  3  3 12  2  9  2  1  1  3  2  3  1  2  3  3  2
[276]  6  3  2  1  2  5  2  1  1  4  3  3  3  1  3  3  1  7  1  2  2  1  4  2  4
[301]  1  1  2  1  2  4  1  1  2  5  4  1  1  1  7  2  4 10  1  1  1  1  1  4  2
[326]  1  1  2  1  1  9  1  1  3  1  1  1  2  1  3  1  1  1  5  3  6  5  1  1  1
[351]  1  5  1  1  1  1  3  1  1  2  1  1  4  1  1  2  2  2  1  2  2  2  4  2  2
[376]  2  1  2  5  1  3  3  3  1  5  1  9  1  2  1  1  1  2  2  7  2  1  1  2  1
[401]  1  1  1  3  2  3  1  2  2  2  2  1  4  6  1  1  1  1  2  1  1  1  1  2  5
[426]  1  1  3  1  1  1  7  1  2  1  1  1 16  1  7  1  1  1  1  2  4  3  4  2  5
[451]  1  2  2  4  1  2  1  1  1  1  1  1  1  3  1  1  1  5  2  1  1  1  1  1  6
[476]  1  3  1  1  2  7  3  2  3  1  5  1  2  1  2  1  1  3  3  1  2  2  3  2  3
[501]  1  1  1  1  5  2  5  2  3  2  1  1  1  1  3  1  2  2  4  2  4  2  1  3  2
[526]  2  1  4 10  3  4  1  3  2  3  1  1  1  1  8  1  1  2  4  4  4  1  1  2  1
[551]  5  1  1  1  2  6  1  9  1  2  3  2  4  1  7  1  3  3  2  1  1  2  1  3  2
[576]  1  1  1  7  2  4  1  2  1  2  3  2  2  1  3  2  3  5  2  1  1  3  1  3  1
[601]  1  3  3  1  1  3  3  3  2  1

$run_lengths_out
  [1]  3  2  3  1  1  1  2  3  4 13  2  4  6  2  4  2  1  1  3  3  1  2  1  1  8
 [26]  3  2  5  1  1  1  1  2  2  1  2  2  3  5  4  1  2  5  2  3  2  1  3  2  2
 [51]  3  2  1  2  2  4  3  1  9  2  1  1  1  3  1  1  1  1  1  1  1  1  1  1  4
 [76]  1 20  3  2  1  1  1  1  3  2  2  1  1 13  5  1  2  1  7  1  3  1  1  3  1
[101]  1  3  3  5  6  3  1  4  2  5  1  2  2  4  1  2  2  7  2  3  5  2  6  2  2
[126]  1  2  4  6  1  1  8  3  4  1  1  1  7  1  3  4  5  1  4  1  1  1  1  2  4
[151]  1  2  1  2  3  2  4  1  1  4  9  3  2  3  1  6  8  3  1  6  2  1  7  3  1
[176]  9  2  1  2  2  1  3  2  1 10  5  1  1  1  1  1  2  2  3  1  4  3  1  1  9
[201]  2  1  3  1  1  5  2  3  1  6  2  1  4  5  1  3  4  1  1  1  3  1  2  1  2
[226]  2  1  1  4  1  2  1  1  1  1 16  3  1  2  5  6  1  2  1  1  1  3  4  1  1
[251]  1  1  1  4  3  3  2  1  2  1  5  3  2  4  1  1  2  1  7  1  2  2  4 14  1
[276]  1  1  7  7  5  3  1  3  1  3  4  2  1  1  2  3  1  1  4  4  3  4  1  4  1
[301]  2  1  1  2  3  3  4  1  5  4  2  2  1  1  3  1  1  4  6  1  1  7  1  2  3
[326]  1  1  1  1  6  3  1  3  7  8  2  1  5  7  1  2  3  2  1  1  4  1  1  3 10
[351]  5  4  1  1  2  1  6  6  2  1  1  3  1  2  3  1  1  6  2  1  7  2  1 15  1
[376]  8  2  1  1  1  1  3  2  1  4  1  2  2  1  3  2  1  1  1  1  1  1  1  2  3
[401]  1  2 13  6  2  3  1  1  3  1  3  3  1  2  1  3  3  4  8  4  3  1  1  2  8
[426]  1  1  2  1  3  1  1 10  1  1  1  1  2  1  2  5  1  4  6  2  7  7  1  3  1
[451]  4  2  1  1  2  2  2  1  1  4  1  1  1  1  3  5  3  1  4  1  4  6  2  2 11
[476]  3  6  3  4  1  6  2  1  5  6  8  2 13  3  1  1  3 13  1  2  6  4  2  1  1
[501]  1  1  3  3  1  1  2  1  1  1  1  1  4  1  1  1  1  7  5  1  2  4  2  1  2
[526]  1  2  4  2  1  4  1  1  2  3  2  3  3  4  1  1  1  4  1  9  1  2  4  6  1
[551]  2  7  1  1  5  2  1  2  1  1  1  2  2  3 13  5  1  1  3  2  1  2  2  3  2
[576]  1  1  1  1  1  2  2  5  4  1  2

$burstiness_in
[1] -0.079

$burstiness_out
[1] -0.045

$entry_stats
$entry_stats$FY
  [1] 1 2 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
 [38] 1 2 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 5 1 1 2 1 1 2 1 3 1 1 1 1 1 2 7 4 1 5 1
 [75] 1 1 1 1 2 1 1 1 1 9 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 2

$entry_stats$LY
  [1] 30 17 28 28 30 30 27 25 28 30 30 29 10 30 25 29 30 24 30 30 28 30 27 30 30
 [26] 30 30 30 28 22 23 30 30 30 30 30 30 25 27 26 14 29 27 30 30 26 30 16 23 27
 [51] 29 26 30 30 29 30 29 23 23 30 26 24 27 30 15 30 28 29 17 29 30 30 22 30 28
 [76] 28 30 29 27 26 30 30 24 17 30 29 30 29 23 30 30 30 30 28 29 29 17 30 30 28

$entry_stats$duration
  [1] 30 16 28 28 30 28 27 25 28 30 30 29 10 30 25 29 30 24 30 30 28 30 27 30 30
 [26] 30 30 30 28 22 23 30 30 30 30 30 29 25 26 25 14 29 27 30 30 24 30 16 23 27
 [51] 29 26 30 30 25 30 29 22 23 30 25 24 25 30 15 30 28 29 16 23 27 30 18 30 28
 [76] 28 30 29 26 26 30 30 24  9 30 29 30 29 23 30 29 30 30 28 29 28 16 30 30 27

$entry_stats$prop_never_observed
[1] 0

$entry_stats$prop_enter_after_1
[1] 0.2

$entry_stats$mean_FY
[1] 1.43

$entry_stats$median_FY
[1] 1

$entry_stats$mean_duration
[1] 26.82

$entry_stats$median_duration
[1] 29


$trend_entry
NULL

$cor_I_r_by_year
 [1] 0.037 0.195 0.247 0.147 0.127 0.125 0.050 0.221 0.034 0.145 0.210 0.397
[13] 0.156 0.227 0.165 0.247 0.175 0.359 0.457 0.363 0.459 0.320 0.338 0.361
[25] 0.394 0.461 0.455 0.355 0.319

$diagnostics
   year n_pairs sd_I sd_r reason
1     2     100 0.50 0.07     ok
2     3     100 0.50 0.08     ok
3     4     100 0.49 0.10     ok
4     5     100 0.50 0.10     ok
5     6     100 0.50 0.08     ok
6     7     100 0.50 0.09     ok
7     8     100 0.50 0.08     ok
8     9     100 0.50 0.12     ok
9    10     100 0.50 0.12     ok
10   11     100 0.50 0.12     ok
11   12     100 0.50 0.18     ok
12   13     100 0.50 0.18     ok
13   14     100 0.50 0.14     ok
14   15     100 0.50 0.11     ok
15   16     100 0.50 0.15     ok
16   17     100 0.50 0.17     ok
17   18     100 0.50 0.18     ok
18   19     100 0.50 0.23     ok
19   20     100 0.50 0.21     ok
20   21     100 0.50 0.23     ok
21   22     100 0.50 0.21     ok
22   23     100 0.50 0.26     ok
23   24     100 0.50 0.26     ok
24   25     100 0.50 0.27     ok
25   26     100 0.50 0.29     ok
26   27     100 0.50 0.27     ok
27   28     100 0.50 0.26     ok
28   29     100 0.47 0.26     ok
29   30     100 0.50 0.27     ok

$note
NULL
## Spp trends (grey line = latent process; blue dot = noisy obs with missingness
# Plot 12 randomly chosen species on log scale
plot_species_trends(out, n_sample = 12, scale = "log")

# Plot specific species (by index) as baseline-1 indices
plot_species_trends(out, species = c(1, 5, 9), scale = "index")

And the same model run without the imputation for the Bayesian geomean:

out2 <- run_full_analysis(data_source = "simulate",
                         sim_args = sim_args,
                         fit_models = c("partial","freeman", "nopool", "bayes_geomean"),
                         ## Model-specific settings (ovm = 2)
                         jags_partial = list(df_mu=6, obs_var_model=4, n_iter=500, n_burn=100),
                         jags_nopool = list(df_mu=6, obs_var_model=4, n_iter=500, n_burn=100),
                         jags_freeman = list(obs_var_model=4, n_iter=500, n_burnin=100),
                         jags_bayes_geomean = list(obs_var_model=4, n_iter=500, n_burnin=100),
                         # smoothed version of Bayesian geomean?
                         smooth_geomean = list(enable = TRUE, prefer_freeman_basis = FALSE),
                         plot_geomean = TRUE, # plot unsmoothed Bayes geomean MSI
                         # impute missing spp/year combinations
                         impute_all_geomean = FALSE, # if false, then under MNAR this estimand differs from other models
                         ## Cross-model settings (seFromData not really needed here as ovm = 2 above)
                         brc_opts = list(num_knots=12, seFromData=TRUE, Y1perfect=FALSE, m.scale = "loge"),
                         ## Growth-rate presentation scale
                         growth_scale = "log",
                         quiet = TRUE) # suppress JAGS output to console

# Indicator plots
print(out2$plots$MSI) # M_prime equivalents

print(out2$plots$CumLog) # M equivalents where applicable

print(out2$plots$Growth) # annual growth on scale specified in run_full_analysis()

Comparing msiAssess to BRCIndicators Freeman implementation

Here we take the simulated MNAR data above and run both the Freeman model from BRCindicators and the msiAssess implementation with the same settings. Allowing for minor differences in implementation (largely those designed to provide msiAssess with more flexibility to handle variably missing SEs and species’ time-series entry points), the msiAssess method essentially returns the same result (note that ordinary Monte Carlo error will also be evident in short MCMC runs like those given here).

# assume out2$sim$y  is S × T  matrix of log indices
Y <- out2$sim$y
years <- out2$sim$years
species <- rownames(Y) %||% paste0("sp", seq_len(nrow(Y)))
colnames(Y) <- paste0("year.", years)
df_wide <- data.frame(species = species, Y)
df_long <- reshape(df_wide,
                   direction = "long",
                   varying = list(names(df_wide)[-1]),
                   v.names = "index",
                   timevar = "year",
                   times = years)
row.names(df_long) <- NULL
head(df_long, n = 3); tail(df_long, n = 3)
  species year       index id
1     sp1    1 -0.04473837  1
2     sp2    1          NA  2
3     sp3    1  0.24794130  3

     species year     index  id
2998    sp98   30  1.284345  98
2999    sp99   30 -3.132135  99
3000   sp100   30        NA 100
# run data through BRCindicators
mod <- BRCindicators::bma(data = df_long, 
                          num.knots = 12, m.scale = "loge",
                          parallel = TRUE,
                          seFromData = FALSE,
                          Y1perfect = TRUE,
                          errorY1 = TRUE,
                          plot = FALSE,
                          n.iter = 1000, seed = 232680)
Processing function input....... 

Done. 
 
Beginning parallel processing using 11 cores. Console output will be suppressed.

Parallel processing completed.

Calculating statistics....... 

Done. 
# Rerun msiAssess for comparability
out3 <- run_full_analysis(data_source = "simulate",
                          # same settings as used for out2 model above
                         sim_args = sim_args,
                         fit_models = "freeman",
                         # for comparability with BRCindicators settings (burn in = floor(n.iter/2))
                         jags_freeman = list(obs_var_model=4, n_iter=1000, n_burnin=500),
                         brc_opts = list(num_knots=12, seFromData=FALSE, Y1perfect=TRUE, m.scale = "loge"),
                         quiet = TRUE)

## Extract msiAssess Freeman Mprime summary (log scale, posterior_summary structure)
mpr <- out3$results$freeman$Mprime

# Convert to index scale, baseline = 100 at first year
mpr_med <- as.numeric(mpr$median)
mpr_lower <- as.numeric(mpr$lower)
mpr_upper <- as.numeric(mpr$upper)
idx_freeman_med <- (exp(mpr_med   - mpr_med[1]))*100
idx_freeman_lower <- (exp(mpr_lower - mpr_med[1]))*100
idx_freeman_upper <- (exp(mpr_upper - mpr_med[1]))*100

## Base plot: BRCIndicators Mprime index
yr <- mod$Year
plot(yr, mod$Index.Mprime,
     type = "l", lwd = 2,
     xlab = "Year", ylab = "Index (baseline = 100)",
     ylim = range(c(mod$Index.Mprime, mod$lowerCI.Mprime, mod$upperCI.Mprime, idx_freeman_lower, idx_freeman_upper), na.rm = TRUE))
legend("topright",
       legend = c("BRCIndicators Freeman", "BRCIndicators Freeman (95% CI)", "msiAssess Freeman (median)", "msiAssess Freeman (95% CI)"),
       col    = c("black", "black", "red", "red"),
       lwd    = c(2, 1, 2, 1),
       lty    = c(1, 2, 1, 2),
       bty    = "n")
## Add BRCIndicator CIs and Freeman msiAssess curve + 95% bands
lines(yr, mod$lowerCI.Mprime, col = "black", lty = 2)
lines(yr, mod$upperCI.Mprime, col = "black", lty = 2)
lines(yr, idx_freeman_med, col = "red", lwd = 2)
lines(yr, idx_freeman_lower, col = "red", lty = 2)
lines(yr, idx_freeman_upper, col = "red", lty = 2)

## Also look at Rhats for theta
head(out3$checks$freeman$table[,c(8:10)], n = 5)
             Rhat n.eff    param
deviance 3.187187     4 deviance
theta    3.134380     4    theta
b[11]    1.200830   300    b[11]
b[10]    1.195896   300    b[10]
b[1]     1.185482   150     b[1]

Note also the Rhat stats for msiAssess model run (these are not available for BRCIndicators: see issue here ). The global SE parameter, theta, is far from converging. The same behaviour has been observed with an empirical dataset with longer chains, namely n_iter = 1000.

References

R. J. Boyd et al. (2025).Using causal diagrams and superpopulation models to correct geographic biases in biodiversity monitoring data. Methods in Ecology and Evolution. 16, 2, 332–344. doi:10.1111/2041-210X.14492

S. N. Freeman et al. (March 2021).A Generic Method for Estimating and Smoothing Multispecies Biodiversity Indicators Using Intermittent Data. Journal of Agricultural, Biological and Environmental Statistics. 26, 1, 71–89. doi:10.1007/s13253-020-00410-6

L. L. Soldaat et al. (October 2017).A Monte Carlo method to account for sampling error in multi-species indicators. Ecological Indicators. 81, 340–347. doi:10.1016/j.ecolind.2017.05.033

About

Multi-species indicator assessment and fitting tools

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages