diff --git a/DESCRIPTION b/DESCRIPTION index f8b381f..78ceea5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: stdmod Title: Standardized Moderation Effect and Its Confidence Interval -Version: 0.2.11 +Version: 0.2.11.1 Authors@R: c(person(given = "Shu Fai", family = "Cheung", @@ -23,7 +23,7 @@ License: GPL-3 Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) -RoxygenNote: 7.3.2 +RoxygenNote: 7.3.3 Suggests: testthat, knitr, diff --git a/NEWS.md b/NEWS.md index d7a131d..4335ae0 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,13 @@ +# stdmod 0.2.11.1 + +## Miscellaneous + +- Updated vignettes to use precomputed + files, instead of using stored objects, + when bootstrapping is used. This also + addressed Issue 127 on GitHub. + (0.2.11.1) + # stdmod 0.2.11 ## Miscellaneous diff --git a/README.md b/README.md index e5fc295..4ad534f 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ # stdmod: Standardized Moderation -(Version 0.2.11, updated on 2024-09-22, [release history](https://sfcheung.github.io/stdmod/news/index.html)) +(Version 0.2.11.1, updated on 2026-01-04, [release history](https://sfcheung.github.io/stdmod/news/index.html)) (Important changes since 0.2.0.0: Bootstrap confidence intervals and variance-covariance matrix of estimates are the defaults of `confint()` diff --git a/rebuild_vignettes.R b/rebuild_vignettes.R index 2c3cfce..8ea81d8 100644 --- a/rebuild_vignettes.R +++ b/rebuild_vignettes.R @@ -4,6 +4,8 @@ base_dir <- getwd() setwd("vignettes/") knitr::knit("cond_effect.Rmd.original", output = "cond_effect.Rmd") -# knitr::knit("moderation.Rmd.original", output = "moderation.Rmd") -# knitr::knit("std_selected.Rmd.original", output = "std_selected.Rmd") +knitr::knit("moderation.Rmd.original", output = "moderation.Rmd") +knitr::knit("std_selected.Rmd.original", output = "std_selected.Rmd") +knitr::knit("stdmod_lavaan.Rmd.original", output = "stdmod_lavaan.Rmd") +knitr::knit("stdmod.Rmd.original", output = "stdmod.Rmd") setwd(base_dir) diff --git a/vignettes/eg2_lm_xwy_std_ci.rds b/vignettes/eg2_lm_xwy_std_ci.rds deleted file mode 100644 index 356426d..0000000 Binary files a/vignettes/eg2_lm_xwy_std_ci.rds and /dev/null differ diff --git a/vignettes/eg_lm_xwy_std_ci.rds b/vignettes/eg_lm_xwy_std_ci.rds deleted file mode 100644 index ba72ed4..0000000 Binary files a/vignettes/eg_lm_xwy_std_ci.rds and /dev/null differ diff --git a/vignettes/egl_lavaan_boot.rds b/vignettes/egl_lavaan_boot.rds deleted file mode 100644 index 9a9ae9d..0000000 Binary files a/vignettes/egl_lavaan_boot.rds and /dev/null differ diff --git a/vignettes/mod_reg-1.png b/vignettes/mod_reg-1.png new file mode 100644 index 0000000..68f7bca Binary files /dev/null and b/vignettes/mod_reg-1.png differ diff --git a/vignettes/mod_reg_stdall-1.png b/vignettes/mod_reg_stdall-1.png new file mode 100644 index 0000000..64e5e04 Binary files /dev/null and b/vignettes/mod_reg_stdall-1.png differ diff --git a/vignettes/moderation.Rmd b/vignettes/moderation.Rmd index 3a835ca..c5f7c62 100644 --- a/vignettes/moderation.Rmd +++ b/vignettes/moderation.Rmd @@ -1,244 +1,410 @@ ---- -title: "Standardized Moderation Effect by std_selected()" -author: "Shu Fai Cheung and David Weng Ngai Vong" -date: "`r Sys.Date()`" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{Standardized Moderation Effect by std_selected()} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.width = 6, - fig.height = 4, - fig.align = "center" -) -``` - -# Purpose - -This document demonstrates how to use `std_selected()` from -the `stdmod` package to compute the correct -standardized solution of moderated regression. -More about this package can be found -in `vignette("stdmod", package = "stdmod")` -or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). - -# Setup the Environment - -```{r setup} -library(stdmod) # For computing the standardized moderation effect conveniently -``` - -# Load the Dataset - -```{r load_dataset} -data(sleep_emo_con) -head(sleep_emo_con, 3) -``` - -This data set has 500 cases of data. The variables are sleep duration, age, gender, -and the scores from two personality scales, emotional stability and -conscientiousness of the IPIP Big Five markers. Please refer to -(citation to be added) for the detail of the data set. - -The names of some variables are shortened for readability: - -```{r} -colnames(sleep_emo_con)[3:4] <- c("cons", "emot") -head(sleep_emo_con, 3) -``` - - -# Moderated Regression - -Suppose we are interested in predicting sleep duration by emotional stability, -after controlling for gender and age. However, we suspect that the effect of -emotional stability, if any, may be moderated by conscientiousness. Therefore, -we conduct a moderated regression as follow: - -```{r mod_reg} -lm_out <- lm(sleep_duration ~ age + gender + emot * cons, - data = sleep_emo_con) -summary(lm_out) -plotmod(lm_out, - x = "emot", - w = "cons", - x_label = "Emotional Stability", - w_label = "Conscientiousness", - y_label = "Sleep Duration") -``` - -The results show that conscientiousness significantly moderates the effect of -emotional stability on sleep duration. - -# Standardized Moderation Effect - -To get the correct standardized solution of the moderated regression, with the -product term formed *after* standardization, we can use `std_selected()`. - -- The first argument is the regression output from `lm()`. - -- The argument `to_center` specifies variables to be mean - centered. - -- The argument `to_scale` specifies variables to be rescaled - by their standard deviations after centering. - -- In `stdmod` 0.2.6.3, the argument `to_standardize` was introduced - as a shortcut. Listing a variable in `to_standardize` is - equivalent to listing it in `to_center` and `to_scale`. - -If we want to standardize or mean center all variables, we can use `~ .` as a -shortcut. Note that `std_selected()` will automatically skip categorical -variables (i.e., factors or string variables in the regression model of `lm()`). - -```{r} -lm_stdall <- std_selected(lm_out, - to_standardize = ~ .) -``` - -Before 0.2.6.3, to standardize all variables except for -categorical variables, we need to use both `to_center = ~ .` -and `to_scale = ~ .`. Since 0.2.6.3, -we can just use `to_standardize = ~ .`, as shown above. -If `to_standardize = ~ .` does not work, just use -`to_center` and `to_scale` as shown below: - -```r -lm_stdall <- std_selected(lm_out, - to_center = ~ ., - to_scale = ~ .) -``` - -A summary of the results of `std_selected()` can be -generated by `summary()`: - -```{r} -summary(lm_stdall) -``` - -The coefficient in this solution, -`r round(coef(lm_stdall)["emot:cons"], 5)`, -can be interpreted as the change in the standardized effect of -emotional stability for each one standard deviation increase of -conscientiousness. Naturally, this can be called the -*standardized moderation effect* of conscientiousness -([Cheung, Cheung, Lau, Hui, & Vong, 2022](https://doi.org/10.1037/hea0001188)). - -The output of `std_selected()` can be passed to other functions that accept the -output of `lm()`. This package also has a simple function, -`plotmod()`, for generating a typical plot of the moderation effect: - -```{r mod_reg_stdall} -plotmod(lm_stdall, - x = "emot", - w = "cons", - x_label = "Emotional Stability", - w_label = "Conscientiousness", - y_label = "Sleep Duration") -``` - -The function `plotmod()` also prints the conditional effects of the predictor -(focal variable), emotional stability in this example. - -# The Common (Incorrect) Standardized Solution - -For comparison, this is the results of standardizing all variables, including -the product term and the categorical variable. - -```{r} -library(lm.beta) # For generating the typical standardized solution -packageVersion("lm.beta") -lm_beta <- lm.beta(lm_out) -summary(lm_beta) -``` - -The coefficient of the *standardized* product term is -`r round(coef(lm_beta)["emot:cons"], 5)`, which -*cannot* be interpreted as the change in the standardized effect of -emotional stability for each one standard deviation increase of -conscientiousness because the product term is standardized and can no longer -be interpreted as the product of two variables in the model. - -# Improved Confidence Intervals - -It has been shown (e.g., [Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)) -that the standard errors of -standardized regression coefficients computed just by standardizing the variables -are biased, and consequently the confidence intervals are also invalid. The -function `std_selected_boot()` is a wrapper of `std_selected()` that also -forms the confidence interval of the regression coefficients when standardizing -is conducted, using nonparametric bootstrapping as suggested by -Cheung, Cheung, Lau, Hui, and Vong (2022). - -We use the same example above that standardizes all variables except for -categorical variables to illustrate this function. The argument `nboot` -specifies the number of nonparametric bootstrap samples. -The level of confidence is set by `conf`. The default is .95, denoting 95% -confidence intervals. If this is the desired level, this argument can be -omitted. - -```{r echo = FALSE, eval = TRUE} -if (file.exists("eg2_lm_xwy_std_ci.rds")) { - lm_xwy_std_ci <- readRDS("eg2_lm_xwy_std_ci.rds") - } else { - set.seed(649017) - lm_xwy_std_ci <- std_selected_boot(lm_out, to_center = ~ ., - to_scale = ~ ., - nboot = 2000) - saveRDS(lm_xwy_std_ci, "eg2_lm_xwy_std_ci.rds", compress = "xz") - } -``` - -```r -set.seed(649017) -lm_xwy_std_ci <- std_selected_boot(lm_out, - to_standardize = ~ ., - nboot = 2000) -``` - -If the default options are acceptable, the only additional argument is `nboot`. - -```{r} -summary(lm_xwy_std_ci) -``` - -```{r echo = FALSE} -tmp <- summary(lm_xwy_std_ci)$coefficients -``` - -The standardized moderation effect is -`r formatC(tmp["emot:cons", "Estimate"], 4, format = "f")`, -and the 95% nonparametric bootstrap confidence interval is -`r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to -`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. - -Note: As a side product, the nonparametric bootstrap percentile confidence of the other -coefficients are also reported. They can be used for other variables that are -standardized in the same model, whether they are involved in the moderation or not. - -# Further Information - -`vignette("plotmod", package = "stdmod")` illustrates how to use `plotmod()` to plot a moderation -effect. If variables are standardized by `std_selected()`, `plotmod()` can -indicate this in the plot. - -`vignette("cond_effect", package = "stdmod")` illustrates how to use `cond_effect()` to compute -conditional effects, the effect of a predictor (focal variable) for selected -levels of the moderator. -`cond_effect()` supports outputs from `std_selected()`. - -# Reference(s) - -Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) -Improving an old way to measure moderation effect in standardized units. -*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. - -Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized -regression coefficients. *Psychometrika, 76*(4), 670-690. -https://doi.org/10.1007/s11336-011-9224-6 +--- +title: "Standardized Moderation Effect by std_selected()" +author: "Shu Fai Cheung and David Weng Ngai Vong" +date: "2026-01-04" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Standardized Moderation Effect by std_selected()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + +# Purpose + +This document demonstrates how to use `std_selected()` from +the `stdmod` package to compute the correct +standardized solution of moderated regression. +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + +# Setup the Environment + + +``` r +library(stdmod) # For computing the standardized moderation effect conveniently +``` + +# Load the Dataset + + +``` r +data(sleep_emo_con) +head(sleep_emo_con, 3) +#> # A tibble: 3 × 6 +#> case_id sleep_duration conscientiousness emotional_stability age gender +#> +#> 1 1 6 3.6 3.6 20 female +#> 2 2 4 3.8 2.4 20 female +#> 3 3 7 4.3 2.7 20 female +``` + +This data set has 500 cases of data. The variables are sleep duration, age, gender, +and the scores from two personality scales, emotional stability and +conscientiousness of the IPIP Big Five markers. Please refer to +(citation to be added) for the detail of the data set. + +The names of some variables are shortened for readability: + + +``` r +colnames(sleep_emo_con)[3:4] <- c("cons", "emot") +head(sleep_emo_con, 3) +#> # A tibble: 3 × 6 +#> case_id sleep_duration cons emot age gender +#> +#> 1 1 6 3.6 3.6 20 female +#> 2 2 4 3.8 2.4 20 female +#> 3 3 7 4.3 2.7 20 female +``` + + +# Moderated Regression + +Suppose we are interested in predicting sleep duration by emotional stability, +after controlling for gender and age. However, we suspect that the effect of +emotional stability, if any, may be moderated by conscientiousness. Therefore, +we conduct a moderated regression as follow: + + +``` r +lm_out <- lm(sleep_duration ~ age + gender + emot * cons, + data = sleep_emo_con) +summary(lm_out) +#> +#> Call: +#> lm(formula = sleep_duration ~ age + gender + emot * cons, data = sleep_emo_con) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -6.0841 -0.7882 0.0089 0.9440 6.1189 +#> +#> Coefficients: +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 1.85154 1.35224 1.369 0.17155 +#> age 0.01789 0.02133 0.838 0.40221 +#> gendermale -0.26127 0.16579 -1.576 0.11570 +#> emot 1.32151 0.45039 2.934 0.00350 ** +#> cons 1.20385 0.37062 3.248 0.00124 ** +#> emot:cons -0.33140 0.13273 -2.497 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 1.384 on 494 degrees of freedom +#> Multiple R-squared: 0.0548, Adjusted R-squared: 0.04523 +#> F-statistic: 5.728 on 5 and 494 DF, p-value: 3.768e-05 +plotmod(lm_out, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +
+plot of chunk mod_reg +

plot of chunk mod_reg

+
+ +The results show that conscientiousness significantly moderates the effect of +emotional stability on sleep duration. + +# Standardized Moderation Effect + +To get the correct standardized solution of the moderated regression, with the +product term formed *after* standardization, we can use `std_selected()`. + +- The first argument is the regression output from `lm()`. + +- The argument `to_center` specifies variables to be mean + centered. + +- The argument `to_scale` specifies variables to be rescaled + by their standard deviations after centering. + +- In `stdmod` 0.2.6.3, the argument `to_standardize` was introduced + as a shortcut. Listing a variable in `to_standardize` is + equivalent to listing it in `to_center` and `to_scale`. + +If we want to standardize or mean center all variables, we can use `~ .` as a +shortcut. Note that `std_selected()` will automatically skip categorical +variables (i.e., factors or string variables in the regression model of `lm()`). + + +``` r +lm_stdall <- std_selected(lm_out, + to_standardize = ~ .) +``` + +Before 0.2.6.3, to standardize all variables except for +categorical variables, we need to use both `to_center = ~ .` +and `to_scale = ~ .`. Since 0.2.6.3, +we can just use `to_standardize = ~ .`, as shown above. +If `to_standardize = ~ .` does not work, just use +`to_center` and `to_scale` as shown below: + +```r +lm_stdall <- std_selected(lm_out, + to_center = ~ ., + to_scale = ~ .) +``` + +A summary of the results of `std_selected()` can be +generated by `summary()`: + + +``` r +summary(lm_stdall) +#> +#> Call to std_selected(): +#> std_selected(lm_out = lm_out, to_standardize = ~.) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: sleep_duration age gender emot cons +#> - Variable(s) scaled: sleep_duration age gender emot cons +#> +#> centered_by scaled_by Note +#> sleep_duration 6.776333 1.4168291 Standardized (mean = 0, SD = 1) +#> age 22.274000 2.9407857 Standardized (mean = 0, SD = 1) +#> gender NA NA Nonnumeric +#> emot 2.713200 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.343200 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> +#> Call: +#> lm(formula = sleep_duration ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -4.2941 -0.5563 0.0063 0.6663 4.3187 +#> +#> Coefficients: +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 0.0549 0.0488 1.1248 0.26124 +#> age 0.0371 0.0443 0.8384 0.40221 +#> gendermale -0.1844 0.1170 -1.5759 0.11570 +#> emot 0.1150 0.0449 2.5600 0.01076 * +#> cons 0.1305 0.0452 2.8893 0.00403 ** +#> emot:cons -0.1083 0.0434 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 0.9771 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - One or more variables are scaled by SD or standardized. OLS standard +#> errors and confidence intervals may be biased for their coefficients. +#> Please use `std_selected_boot()`. +``` + +The coefficient in this solution, +-0.10829, +can be interpreted as the change in the standardized effect of +emotional stability for each one standard deviation increase of +conscientiousness. Naturally, this can be called the +*standardized moderation effect* of conscientiousness +([Cheung, Cheung, Lau, Hui, & Vong, 2022](https://doi.org/10.1037/hea0001188)). + +The output of `std_selected()` can be passed to other functions that accept the +output of `lm()`. This package also has a simple function, +`plotmod()`, for generating a typical plot of the moderation effect: + + +``` r +plotmod(lm_stdall, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +
+plot of chunk mod_reg_stdall +

plot of chunk mod_reg_stdall

+
+ +The function `plotmod()` also prints the conditional effects of the predictor +(focal variable), emotional stability in this example. + +# The Common (Incorrect) Standardized Solution + +For comparison, this is the results of standardizing all variables, including +the product term and the categorical variable. + + +``` r +library(lm.beta) # For generating the typical standardized solution +packageVersion("lm.beta") +#> [1] '1.7.3' +lm_beta <- lm.beta(lm_out) +summary(lm_beta) +#> +#> Call: +#> lm(formula = sleep_duration ~ age + gender + emot * cons, data = sleep_emo_con) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -6.0841 -0.7882 0.0089 0.9440 6.1189 +#> +#> Coefficients: +#> Estimate Standardized Std. Error t value Pr(>|t|) +#> (Intercept) 1.85154 NA 1.35224 1.369 0.17155 +#> age 0.01789 0.03712 0.02133 0.838 0.40221 +#> gendermale -0.26127 -0.06934 0.16579 -1.576 0.11570 +#> emot 1.32151 0.71163 0.45039 2.934 0.00350 ** +#> cons 1.20385 0.51560 0.37062 3.248 0.00124 ** +#> emot:cons -0.33140 -0.78201 0.13273 -2.497 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 1.384 on 494 degrees of freedom +#> Multiple R-squared: 0.0548, Adjusted R-squared: 0.04523 +#> F-statistic: 5.728 on 5 and 494 DF, p-value: 3.768e-05 +``` + +The coefficient of the *standardized* product term is +-0.78201, which +*cannot* be interpreted as the change in the standardized effect of +emotional stability for each one standard deviation increase of +conscientiousness because the product term is standardized and can no longer +be interpreted as the product of two variables in the model. + +# Improved Confidence Intervals + +It has been shown (e.g., [Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)) +that the standard errors of +standardized regression coefficients computed just by standardizing the variables +are biased, and consequently the confidence intervals are also invalid. The +function `std_selected_boot()` is a wrapper of `std_selected()` that also +forms the confidence interval of the regression coefficients when standardizing +is conducted, using nonparametric bootstrapping as suggested by +Cheung, Cheung, Lau, Hui, and Vong (2022). + +We use the same example above that standardizes all variables except for +categorical variables to illustrate this function. The argument `nboot` +specifies the number of nonparametric bootstrap samples. +The level of confidence is set by `conf`. The default is .95, denoting 95% +confidence intervals. If this is the desired level, this argument can be +omitted. + + + + +``` r +set.seed(649017) +lm_xwy_std_ci <- std_selected_boot(lm_out, + to_standardize = ~ ., + nboot = 2000) +``` + +If the default options are acceptable, the only additional argument is `nboot`. + + +``` r +summary(lm_xwy_std_ci) +#> +#> Call to std_selected_boot(): +#> std_selected_boot(lm_out = lm_out, to_standardize = ~., nboot = 2000) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: sleep_duration age gender emot cons +#> - Variable(s) scaled: sleep_duration age gender emot cons +#> +#> centered_by scaled_by Note +#> sleep_duration 6.776333 1.4168291 Standardized (mean = 0, SD = 1) +#> age 22.274000 2.9407857 Standardized (mean = 0, SD = 1) +#> gender NA NA Nonnumeric +#> emot 2.713200 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.343200 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> - Nonparametric bootstrapping 95% confidence intervals computed. +#> - The number of bootstrap samples is 2000. +#> +#> Call: +#> lm(formula = sleep_duration ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -4.2941 -0.5563 0.0063 0.6663 4.3187 +#> +#> Coefficients: +#> Estimate CI Lower CI Upper Std. Error t value Pr(>|t|) +#> (Intercept) 0.0549 0.0030 0.1043 0.0488 1.1248 0.26124 +#> age 0.0371 -0.0363 0.1036 0.0443 0.8384 0.40221 +#> gendermale -0.1844 -0.4389 0.0876 0.1170 -1.5759 0.11570 +#> emot 0.1150 0.0236 0.2024 0.0449 2.5600 0.01076 * +#> cons 0.1305 0.0324 0.2242 0.0452 2.8893 0.00403 ** +#> emot:cons -0.1083 -0.2040 -0.0097 0.0434 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 0.9771 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - [CI Lower, CI Upper] are bootstrap percentile confidence intervals. +#> - Std. Error are not bootstrap SEs. +``` + + + +The standardized moderation effect is +-0.1083, +and the 95% nonparametric bootstrap confidence interval is +-0.2040 to +-0.0097. + +Note: As a side product, the nonparametric bootstrap percentile confidence of the other +coefficients are also reported. They can be used for other variables that are +standardized in the same model, whether they are involved in the moderation or not. + +# Further Information + +`vignette("plotmod", package = "stdmod")` illustrates how to use `plotmod()` to plot a moderation +effect. If variables are standardized by `std_selected()`, `plotmod()` can +indicate this in the plot. + +`vignette("cond_effect", package = "stdmod")` illustrates how to use `cond_effect()` to compute +conditional effects, the effect of a predictor (focal variable) for selected +levels of the moderator. +`cond_effect()` supports outputs from `std_selected()`. + +# Reference(s) + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + +Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized +regression coefficients. *Psychometrika, 76*(4), 670-690. +https://doi.org/10.1007/s11336-011-9224-6 diff --git a/vignettes/moderation.Rmd.original b/vignettes/moderation.Rmd.original new file mode 100644 index 0000000..168f603 --- /dev/null +++ b/vignettes/moderation.Rmd.original @@ -0,0 +1,245 @@ +--- +title: "Standardized Moderation Effect by std_selected()" +author: "Shu Fai Cheung and David Weng Ngai Vong" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Standardized Moderation Effect by std_selected()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.width = 6, + fig.height = 4, + fig.align = "center", + fig.path = "" +) +``` + +# Purpose + +This document demonstrates how to use `std_selected()` from +the `stdmod` package to compute the correct +standardized solution of moderated regression. +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + +# Setup the Environment + +```{r setup} +library(stdmod) # For computing the standardized moderation effect conveniently +``` + +# Load the Dataset + +```{r load_dataset} +data(sleep_emo_con) +head(sleep_emo_con, 3) +``` + +This data set has 500 cases of data. The variables are sleep duration, age, gender, +and the scores from two personality scales, emotional stability and +conscientiousness of the IPIP Big Five markers. Please refer to +(citation to be added) for the detail of the data set. + +The names of some variables are shortened for readability: + +```{r} +colnames(sleep_emo_con)[3:4] <- c("cons", "emot") +head(sleep_emo_con, 3) +``` + + +# Moderated Regression + +Suppose we are interested in predicting sleep duration by emotional stability, +after controlling for gender and age. However, we suspect that the effect of +emotional stability, if any, may be moderated by conscientiousness. Therefore, +we conduct a moderated regression as follow: + +```{r mod_reg} +lm_out <- lm(sleep_duration ~ age + gender + emot * cons, + data = sleep_emo_con) +summary(lm_out) +plotmod(lm_out, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +The results show that conscientiousness significantly moderates the effect of +emotional stability on sleep duration. + +# Standardized Moderation Effect + +To get the correct standardized solution of the moderated regression, with the +product term formed *after* standardization, we can use `std_selected()`. + +- The first argument is the regression output from `lm()`. + +- The argument `to_center` specifies variables to be mean + centered. + +- The argument `to_scale` specifies variables to be rescaled + by their standard deviations after centering. + +- In `stdmod` 0.2.6.3, the argument `to_standardize` was introduced + as a shortcut. Listing a variable in `to_standardize` is + equivalent to listing it in `to_center` and `to_scale`. + +If we want to standardize or mean center all variables, we can use `~ .` as a +shortcut. Note that `std_selected()` will automatically skip categorical +variables (i.e., factors or string variables in the regression model of `lm()`). + +```{r} +lm_stdall <- std_selected(lm_out, + to_standardize = ~ .) +``` + +Before 0.2.6.3, to standardize all variables except for +categorical variables, we need to use both `to_center = ~ .` +and `to_scale = ~ .`. Since 0.2.6.3, +we can just use `to_standardize = ~ .`, as shown above. +If `to_standardize = ~ .` does not work, just use +`to_center` and `to_scale` as shown below: + +```r +lm_stdall <- std_selected(lm_out, + to_center = ~ ., + to_scale = ~ .) +``` + +A summary of the results of `std_selected()` can be +generated by `summary()`: + +```{r} +summary(lm_stdall) +``` + +The coefficient in this solution, +`r round(coef(lm_stdall)["emot:cons"], 5)`, +can be interpreted as the change in the standardized effect of +emotional stability for each one standard deviation increase of +conscientiousness. Naturally, this can be called the +*standardized moderation effect* of conscientiousness +([Cheung, Cheung, Lau, Hui, & Vong, 2022](https://doi.org/10.1037/hea0001188)). + +The output of `std_selected()` can be passed to other functions that accept the +output of `lm()`. This package also has a simple function, +`plotmod()`, for generating a typical plot of the moderation effect: + +```{r mod_reg_stdall} +plotmod(lm_stdall, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +The function `plotmod()` also prints the conditional effects of the predictor +(focal variable), emotional stability in this example. + +# The Common (Incorrect) Standardized Solution + +For comparison, this is the results of standardizing all variables, including +the product term and the categorical variable. + +```{r} +library(lm.beta) # For generating the typical standardized solution +packageVersion("lm.beta") +lm_beta <- lm.beta(lm_out) +summary(lm_beta) +``` + +The coefficient of the *standardized* product term is +`r round(coef(lm_beta)["emot:cons"], 5)`, which +*cannot* be interpreted as the change in the standardized effect of +emotional stability for each one standard deviation increase of +conscientiousness because the product term is standardized and can no longer +be interpreted as the product of two variables in the model. + +# Improved Confidence Intervals + +It has been shown (e.g., [Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)) +that the standard errors of +standardized regression coefficients computed just by standardizing the variables +are biased, and consequently the confidence intervals are also invalid. The +function `std_selected_boot()` is a wrapper of `std_selected()` that also +forms the confidence interval of the regression coefficients when standardizing +is conducted, using nonparametric bootstrapping as suggested by +Cheung, Cheung, Lau, Hui, and Vong (2022). + +We use the same example above that standardizes all variables except for +categorical variables to illustrate this function. The argument `nboot` +specifies the number of nonparametric bootstrap samples. +The level of confidence is set by `conf`. The default is .95, denoting 95% +confidence intervals. If this is the desired level, this argument can be +omitted. + +```{r echo = FALSE, eval = FALSE} +if (file.exists("eg2_lm_xwy_std_ci.rds")) { + lm_xwy_std_ci <- readRDS("eg2_lm_xwy_std_ci.rds") + } else { + set.seed(649017) + lm_xwy_std_ci <- std_selected_boot(lm_out, to_center = ~ ., + to_scale = ~ ., + nboot = 2000) + saveRDS(lm_xwy_std_ci, "eg2_lm_xwy_std_ci.rds", compress = "xz") + } +``` + +```{r} +set.seed(649017) +lm_xwy_std_ci <- std_selected_boot(lm_out, + to_standardize = ~ ., + nboot = 2000) +``` + +If the default options are acceptable, the only additional argument is `nboot`. + +```{r} +summary(lm_xwy_std_ci) +``` + +```{r echo = FALSE} +tmp <- summary(lm_xwy_std_ci)$coefficients +``` + +The standardized moderation effect is +`r formatC(tmp["emot:cons", "Estimate"], 4, format = "f")`, +and the 95% nonparametric bootstrap confidence interval is +`r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to +`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. + +Note: As a side product, the nonparametric bootstrap percentile confidence of the other +coefficients are also reported. They can be used for other variables that are +standardized in the same model, whether they are involved in the moderation or not. + +# Further Information + +`vignette("plotmod", package = "stdmod")` illustrates how to use `plotmod()` to plot a moderation +effect. If variables are standardized by `std_selected()`, `plotmod()` can +indicate this in the plot. + +`vignette("cond_effect", package = "stdmod")` illustrates how to use `cond_effect()` to compute +conditional effects, the effect of a predictor (focal variable) for selected +levels of the moderator. +`cond_effect()` supports outputs from `std_selected()`. + +# Reference(s) + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + +Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized +regression coefficients. *Psychometrika, 76*(4), 670-690. +https://doi.org/10.1007/s11336-011-9224-6 diff --git a/vignettes/std_selected.Rmd b/vignettes/std_selected.Rmd index 8e79747..b402167 100644 --- a/vignettes/std_selected.Rmd +++ b/vignettes/std_selected.Rmd @@ -1,326 +1,481 @@ ---- -title: "Mean Center and Standardize Selected Variable by std_selected()" -author: "Shu Fai Cheung and David Weng Ngai Vong" -date: "`r Sys.Date()`" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{Mean Center and Standardize Selected Variable by std_selected()} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.width = 6, - fig.height = 4, - fig.align = "center" -) -``` - -# Purpose - -Instead of standardizing all variables, even variables that (a) are categorical -and should not be standardized, or (b) measured on meaningful unites and do -not need to be standardized, `std_selected()` from the package -`stdmod` allows users to have more -control on how standardization is to be conducted. - -A moderated regression model is used as an example but it can also be used for -regression models without interaction terms. - -More about this package can be found -in `vignette("stdmod", package = "stdmod")` -or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). - -# Setup the Environment - -```{r setup} -library(stdmod) -``` - -# Load the Dataset - -```{r load_dataset} -data(sleep_emo_con) -head(sleep_emo_con, 3) -``` - -This data set has 500 cases of data. The variables are sleep duration, -age, gender, and the scores from two personality scales, emotional stability -and conscientiousness of the IPIP Big Five markers. Please refer to -(citation to be included) for the detail of the data set. - -The names of some variables are shortened for readability: - -```{r shorten_names} -colnames(sleep_emo_con)[3:4] <- c("cons", "emot") -head(sleep_emo_con, 3) -``` - -# Moderated Regression - -Suppose we are interested in predicting sleep duration by emotional -stability, after controlling for gender and age. However, we suspect that the -effect of emotional stability, if any, may be moderated by conscientiousness. -Therefore, we conduct a moderated regression as follow: - -```{r mod_reg} -lm_raw <- lm(sleep_duration ~ age + gender + emot * cons, - data = sleep_emo_con) -summary(lm_raw) -``` - -The results show that conscientiousness significantly moderates the effect of -emotional stability on sleep duration. - -This package has a simple function, `plotmod()`, for generating a typical plot -of the moderation effect: - -```{r std_selected_lm_raw_plot} -plotmod(lm_raw, - x = "emot", - w = "cons", - x_label = "Emotional Stability", - w_label = "Conscientiousness", - y_label = "Sleep Duration") -``` - -The function `plotmod()` also prints the conditional effects of the predictor, -emotional stability in this example. - -# Mean Center the Moderator - -To know the effect of emotional stability when conscientiousness is equal to its -mean, we can center conscientiousness by its mean in the data and redo the -moderated regression. Instead of creating the new variable and rerun the -regression, we can pass the `lm()` output to `std_selected()` and specify the -variables to be mean centered: - - -```{r lm_w_centered} -lm_w_centered <- std_selected(lm_raw, - to_center = ~ cons) -printCoefmat(summary(lm_w_centered)$coefficients, digits = 3) -``` - -The argument for meaning centering is `to_center`. The variable is specified -in the formula form, placing them on the right hand side of the formula. - -In this example, when conscientiousness -is at mean level, the effect of emotional stability is -`r formatC(coef(lm_w_centered)["emot"], 4, format = "f")`. - -# Mean Center The Moderator and the Focal Variable - -This example demonstrates centering more than one variable. In the following -model, both emotional stability and conscientiousness are centered. They are -placed after `~` and joined by `+`. - -```{r lm_xw_centered} -lm_xw_centered <- std_selected(lm_raw, - to_center = ~ emot + cons) -printCoefmat(summary(lm_xw_centered)$coefficients, digits = 3) -``` - -# Standardize The Moderator and The Focal Variable - -To standardize a variable we first mean center it and then scale it by its -standard deviation. Scaling is done by listing the variable on `to_scale`. -The input format is identical to that of `to_center`. - -```r -lm_xw_std <- std_selected(lm_raw, - to_center = ~ emot + cons, - to_scale = ~ emot + cons) -``` - -Since 0.2.6.3 of `stdmod`, `to_standardize` can be used as a shortcut. -Listing a variable on `to_standardize` is equivalent to listing it -on both `to_center` and `to_scale`. Therefore, the following -call can also be used: - -```{r lm_xw_std} -lm_xw_std <- std_selected(lm_raw, - to_standardize = ~ emot + cons) -``` - -```{r lm_xw_std_coef} -printCoefmat(summary(lm_xw_std)$coefficients, digits = 3) -``` - -In this example, when conscientiousness -is at mean level, for each one standard deviation increase of -emotional stability, the predicted sleep duration increases by -`r formatC(coef(lm_xw_std)["emot"], 4, format = "f")` hour. - -```{r std_selected_lm_xw_std_plot} -plotmod(lm_xw_std, - x = "emot", - w = "cons", - x_label = "Emotional Stability", - w_label = "Conscientiousness", - y_label = "Sleep Duration") -``` - -The function `plotmod()` automatically checks whether a variable is -standardized. If yes, it will report this in the plot as table note on the -bottom. - -The pattern of -the plot does not change. However, the conditional effects reported -in the graph are now -based on the model with emotional stability and conscientiousness -standardized. - -# Standardize The Moderator, The Focal Variable, and the Outcome Variable - -We can also mean center or standardize the outcome variable (dependent -variable). We -just add the variable to the right hand side of `~` in `to_center` and -`to_scale` as appropriate. - -```r -lm_xwy_std <- std_selected(lm_raw, - to_center = ~ emot + cons + sleep_duration, - to_scale = ~ emot + cons + sleep_duration) -``` - -Since 0.2.6.3, `to_standardize` can be used as a shortcut: - -```{r lm_xwy_std} -lm_xwy_std <- std_selected(lm_raw, - to_standardize = ~ emot + cons + sleep_duration) -printCoefmat(summary(lm_xwy_std)$coefficients, digits = 3) -``` - -In this example, when conscientiousness -is at mean level, the standardized moderation effect of -emotional stability on sleep duration is -`r formatC(coef(lm_xwy_std)["emot"], 4, format = "f")`. - -```{r std_selected_lm_xwy_std_plot} -plotmod(lm_xwy_std, - x = "emot", - w = "cons", - x_label = "Emotional Stability", - w_label = "Conscientiousness", - y_label = "Sleep Duration") -``` - -Again, the pattern of -the plot does not change, but the conditional effects reported -in the graph are now -based on the model with emotional stability, conscientiousness, -and sleep duration standardized. - -# Standardize All Variables - -If we want to standardize all variables except for categorical variables, if any, -we can use `~ .` as a shortcut. `std_selected()` will automatically -skip categorical variables (i.e., factors or string variables in the -regression model of `lm()`). - -```r -lm_all_std <- std_selected(lm_raw, - to_center = ~ ., - to_scale = ~ .) -``` - -Since 0.2.6.3, `to_standardize` can be used as a shortcut: - -```{r lm_all_std} -lm_all_std <- std_selected(lm_raw, - to_standardize = ~ .) -printCoefmat(summary(lm_all_std)$coefficients, digits = 3) -``` - -# The Usual Standardized Solution - -For comparison, this is the results of standardizing all variables, including -the product term and the categorical variable. - -```{r lm_beta} -library(lm.beta) # For generating the typical standardized solution -packageVersion("lm.beta") -lm_usual_std <- lm.beta(lm_raw) -printCoefmat(summary(lm_usual_std)$coefficients, digits = 3) -``` - -In moderated regression, the coefficient of *standardized* product term, -`r formatC(coef(lm_usual_std)["emot:cons"], 4, format = "f")`, -is not interpretable. The coefficient of *standardized* gender, -`r formatC(coef(lm_usual_std)["gendermale"], 4, format = "f")`, is also -difficult to interpret. - -# Improved Confidence Interval For "Betas" - -It has been shown (e.g., [Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)) that the standard errors of -standardized regression coefficients (betas) computed just by standardizing -the variables -are biased, and consequently the confidence intervals are also invalid. The -function `std_selected_boot()` is a wrapper of `std_selected()` that also -forms the confidence interval of the regression coefficients when standardization -is conducted, -using nonparametric bootstrapping as suggested by -[Cheung, Cheung, Lau, Hui, and Vong (2022)](https://doi.org/10.1037/hea0001188). - -We use the same example above that standardizes emotional stability, -conscientiousness, and sleep duration, to illustrate this function. -The argument `nboot` specifies the number of nonparametric bootstrap samples. -The level of confidence set by `conf`. The default is .95, denoting 95% -confidence intervals. If this is the desired level, this argument can be omitted. - -```{r echo = FALSE, eval = TRUE} -if (file.exists("eg_lm_xwy_std_ci.rds")) { - lm_xwy_std_ci <- readRDS("eg_lm_xwy_std_ci.rds") - } else { - set.seed(58702) - lm_xwy_std_ci <- std_selected_boot(lm_raw, - to_center = ~ emot + cons + sleep_duration, - to_scale = ~ emot + cons + sleep_duration, - nboot = 2000) - saveRDS(lm_xwy_std_ci, "eg_lm_xwy_std_ci.rds", compress = "xz") - } -``` - -```r -set.seed(58702) -lm_xwy_std_ci <- std_selected_boot(lm_raw, - to_standardize = ~ emot + cons + sleep_duration, - nboot = 2000) -``` - -```{r lm_xwy_std_ci_summary} -summary(lm_xwy_std_ci) -``` - -```{r echo = FALSE} -tmp <- summary(lm_xwy_std_ci)$coefficients -``` - -The standardized moderation effect is `r formatC(tmp["emot:cons", "Estimate"], 4, format = "f")` -, and the 95% nonparametric bootstrap percentile confidence interval is -`r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to -`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. - -Note: As a side product, the nonparametric bootstrap confidence of the -other coefficients are also reported. They can be used for other variables that -are standardized in the same model, whether they are involved in the moderation or not. - -# Further Information - -Further information on the functions can be found in their help pages -(`std_selected()` and `std_selected_boot()`). For example, parallel computation -can be used when doing bootstrapping, if the number of bootstrapping samples -requested is large. - -# Reference - -Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) -Improving an old way to measure moderation effect in standardized units. -*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. - -Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized -regression coefficients. *Psychometrika, 76*(4), 670-690. https://doi.org/10.1007/s11336-011-9224-6 +--- +title: "Mean Center and Standardize Selected Variable by std_selected()" +author: "Shu Fai Cheung and David Weng Ngai Vong" +date: "2026-01-04" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Mean Center and Standardize Selected Variable by std_selected()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + +# Purpose + +Instead of standardizing all variables, even variables that (a) are categorical +and should not be standardized, or (b) measured on meaningful unites and do +not need to be standardized, `std_selected()` from the package +`stdmod` allows users to have more +control on how standardization is to be conducted. + +A moderated regression model is used as an example but it can also be used for +regression models without interaction terms. + +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + +# Setup the Environment + + +``` r +library(stdmod) +``` + +# Load the Dataset + + +``` r +data(sleep_emo_con) +head(sleep_emo_con, 3) +#> # A tibble: 3 × 6 +#> case_id sleep_duration conscientiousness emotional_stability age gender +#> +#> 1 1 6 3.6 3.6 20 female +#> 2 2 4 3.8 2.4 20 female +#> 3 3 7 4.3 2.7 20 female +``` + +This data set has 500 cases of data. The variables are sleep duration, +age, gender, and the scores from two personality scales, emotional stability +and conscientiousness of the IPIP Big Five markers. Please refer to +(citation to be included) for the detail of the data set. + +The names of some variables are shortened for readability: + + +``` r +colnames(sleep_emo_con)[3:4] <- c("cons", "emot") +head(sleep_emo_con, 3) +#> # A tibble: 3 × 6 +#> case_id sleep_duration cons emot age gender +#> +#> 1 1 6 3.6 3.6 20 female +#> 2 2 4 3.8 2.4 20 female +#> 3 3 7 4.3 2.7 20 female +``` + +# Moderated Regression + +Suppose we are interested in predicting sleep duration by emotional +stability, after controlling for gender and age. However, we suspect that the +effect of emotional stability, if any, may be moderated by conscientiousness. +Therefore, we conduct a moderated regression as follow: + + +``` r +lm_raw <- lm(sleep_duration ~ age + gender + emot * cons, + data = sleep_emo_con) +summary(lm_raw) +#> +#> Call: +#> lm(formula = sleep_duration ~ age + gender + emot * cons, data = sleep_emo_con) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -6.0841 -0.7882 0.0089 0.9440 6.1189 +#> +#> Coefficients: +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 1.85154 1.35224 1.369 0.17155 +#> age 0.01789 0.02133 0.838 0.40221 +#> gendermale -0.26127 0.16579 -1.576 0.11570 +#> emot 1.32151 0.45039 2.934 0.00350 ** +#> cons 1.20385 0.37062 3.248 0.00124 ** +#> emot:cons -0.33140 0.13273 -2.497 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 1.384 on 494 degrees of freedom +#> Multiple R-squared: 0.0548, Adjusted R-squared: 0.04523 +#> F-statistic: 5.728 on 5 and 494 DF, p-value: 3.768e-05 +``` + +The results show that conscientiousness significantly moderates the effect of +emotional stability on sleep duration. + +This package has a simple function, `plotmod()`, for generating a typical plot +of the moderation effect: + + +``` r +plotmod(lm_raw, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +
+plot of chunk std_selected_lm_raw_plot +

plot of chunk std_selected_lm_raw_plot

+
+ +The function `plotmod()` also prints the conditional effects of the predictor, +emotional stability in this example. + +# Mean Center the Moderator + +To know the effect of emotional stability when conscientiousness is equal to its +mean, we can center conscientiousness by its mean in the data and redo the +moderated regression. Instead of creating the new variable and rerun the +regression, we can pass the `lm()` output to `std_selected()` and specify the +variables to be mean centered: + + + +``` r +lm_w_centered <- std_selected(lm_raw, + to_center = ~ cons) +printCoefmat(summary(lm_w_centered)$coefficients, digits = 3) +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 5.8763 0.5170 11.37 <2e-16 *** +#> age 0.0179 0.0213 0.84 0.4022 +#> gendermale -0.2613 0.1658 -1.58 0.1157 +#> emot 0.2136 0.0834 2.56 0.0108 * +#> cons 1.2039 0.3706 3.25 0.0012 ** +#> emot:cons -0.3314 0.1327 -2.50 0.0129 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +``` + +The argument for meaning centering is `to_center`. The variable is specified +in the formula form, placing them on the right hand side of the formula. + +In this example, when conscientiousness +is at mean level, the effect of emotional stability is +0.2136. + +# Mean Center The Moderator and the Focal Variable + +This example demonstrates centering more than one variable. In the following +model, both emotional stability and conscientiousness are centered. They are +placed after `~` and joined by `+`. + + +``` r +lm_xw_centered <- std_selected(lm_raw, + to_center = ~ emot + cons) +printCoefmat(summary(lm_xw_centered)$coefficients, digits = 3) +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 6.4557 0.4783 13.50 <2e-16 *** +#> age 0.0179 0.0213 0.84 0.402 +#> gendermale -0.2613 0.1658 -1.58 0.116 +#> emot 0.2136 0.0834 2.56 0.011 * +#> cons 0.3047 0.1055 2.89 0.004 ** +#> emot:cons -0.3314 0.1327 -2.50 0.013 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +``` + +# Standardize The Moderator and The Focal Variable + +To standardize a variable we first mean center it and then scale it by its +standard deviation. Scaling is done by listing the variable on `to_scale`. +The input format is identical to that of `to_center`. + +```r +lm_xw_std <- std_selected(lm_raw, + to_center = ~ emot + cons, + to_scale = ~ emot + cons) +``` + +Since 0.2.6.3 of `stdmod`, `to_standardize` can be used as a shortcut. +Listing a variable on `to_standardize` is equivalent to listing it +on both `to_center` and `to_scale`. Therefore, the following +call can also be used: + + +``` r +lm_xw_std <- std_selected(lm_raw, + to_standardize = ~ emot + cons) +``` + + +``` r +printCoefmat(summary(lm_xw_std)$coefficients, digits = 3) +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 6.4557 0.4783 13.50 <2e-16 *** +#> age 0.0179 0.0213 0.84 0.402 +#> gendermale -0.2613 0.1658 -1.58 0.116 +#> emot 0.1630 0.0637 2.56 0.011 * +#> cons 0.1849 0.0640 2.89 0.004 ** +#> emot:cons -0.1534 0.0615 -2.50 0.013 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +``` + +In this example, when conscientiousness +is at mean level, for each one standard deviation increase of +emotional stability, the predicted sleep duration increases by +0.1630 hour. + + +``` r +plotmod(lm_xw_std, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +
+plot of chunk std_selected_lm_xw_std_plot +

plot of chunk std_selected_lm_xw_std_plot

+
+ +The function `plotmod()` automatically checks whether a variable is +standardized. If yes, it will report this in the plot as table note on the +bottom. + +The pattern of +the plot does not change. However, the conditional effects reported +in the graph are now +based on the model with emotional stability and conscientiousness +standardized. + +# Standardize The Moderator, The Focal Variable, and the Outcome Variable + +We can also mean center or standardize the outcome variable (dependent +variable). We +just add the variable to the right hand side of `~` in `to_center` and +`to_scale` as appropriate. + +```r +lm_xwy_std <- std_selected(lm_raw, + to_center = ~ emot + cons + sleep_duration, + to_scale = ~ emot + cons + sleep_duration) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + + +``` r +lm_xwy_std <- std_selected(lm_raw, + to_standardize = ~ emot + cons + sleep_duration) +printCoefmat(summary(lm_xwy_std)$coefficients, digits = 3) +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) -0.2263 0.3376 -0.67 0.503 +#> age 0.0126 0.0151 0.84 0.402 +#> gendermale -0.1844 0.1170 -1.58 0.116 +#> emot 0.1150 0.0449 2.56 0.011 * +#> cons 0.1305 0.0452 2.89 0.004 ** +#> emot:cons -0.1083 0.0434 -2.50 0.013 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +``` + +In this example, when conscientiousness +is at mean level, the standardized moderation effect of +emotional stability on sleep duration is +0.1150. + + +``` r +plotmod(lm_xwy_std, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +
+plot of chunk std_selected_lm_xwy_std_plot +

plot of chunk std_selected_lm_xwy_std_plot

+
+ +Again, the pattern of +the plot does not change, but the conditional effects reported +in the graph are now +based on the model with emotional stability, conscientiousness, +and sleep duration standardized. + +# Standardize All Variables + +If we want to standardize all variables except for categorical variables, if any, +we can use `~ .` as a shortcut. `std_selected()` will automatically +skip categorical variables (i.e., factors or string variables in the +regression model of `lm()`). + +```r +lm_all_std <- std_selected(lm_raw, + to_center = ~ ., + to_scale = ~ .) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + + +``` r +lm_all_std <- std_selected(lm_raw, + to_standardize = ~ .) +printCoefmat(summary(lm_all_std)$coefficients, digits = 3) +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 0.0549 0.0488 1.12 0.261 +#> age 0.0371 0.0443 0.84 0.402 +#> gendermale -0.1844 0.1170 -1.58 0.116 +#> emot 0.1150 0.0449 2.56 0.011 * +#> cons 0.1305 0.0452 2.89 0.004 ** +#> emot:cons -0.1083 0.0434 -2.50 0.013 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +``` + +# The Usual Standardized Solution + +For comparison, this is the results of standardizing all variables, including +the product term and the categorical variable. + + +``` r +library(lm.beta) # For generating the typical standardized solution +packageVersion("lm.beta") +#> [1] '1.7.3' +lm_usual_std <- lm.beta(lm_raw) +printCoefmat(summary(lm_usual_std)$coefficients, digits = 3) +#> Estimate Standardized Std. Error t value Pr(>|t|) +#> (Intercept) 1.8515 NA 1.3522 1.37 0.1715 +#> age 0.0179 0.0371 0.0213 0.84 0.4022 +#> gendermale -0.2613 -0.0693 0.1658 -1.58 0.1157 +#> emot 1.3215 0.7116 0.4504 2.93 0.0035 ** +#> cons 1.2039 0.5156 0.3706 3.25 0.0012 ** +#> emot:cons -0.3314 -0.7820 0.1327 -2.50 0.0129 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +``` + +In moderated regression, the coefficient of *standardized* product term, +-0.7820, +is not interpretable. The coefficient of *standardized* gender, +-0.0693, is also +difficult to interpret. + +# Improved Confidence Interval For "Betas" + +It has been shown (e.g., [Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)) that the standard errors of +standardized regression coefficients (betas) computed just by standardizing +the variables +are biased, and consequently the confidence intervals are also invalid. The +function `std_selected_boot()` is a wrapper of `std_selected()` that also +forms the confidence interval of the regression coefficients when standardization +is conducted, +using nonparametric bootstrapping as suggested by +[Cheung, Cheung, Lau, Hui, and Vong (2022)](https://doi.org/10.1037/hea0001188). + +We use the same example above that standardizes emotional stability, +conscientiousness, and sleep duration, to illustrate this function. +The argument `nboot` specifies the number of nonparametric bootstrap samples. +The level of confidence set by `conf`. The default is .95, denoting 95% +confidence intervals. If this is the desired level, this argument can be omitted. + + + + +``` r +set.seed(58702) +lm_xwy_std_ci <- std_selected_boot(lm_raw, + to_standardize = ~ emot + cons + sleep_duration, + nboot = 2000) +``` + + +``` r +summary(lm_xwy_std_ci) +#> +#> Call to std_selected_boot(): +#> std_selected_boot(lm_out = lm_raw, to_standardize = ~emot + cons + +#> sleep_duration, nboot = 2000) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: emot cons sleep_duration +#> - Variable(s) scaled: emot cons sleep_duration +#> +#> centered_by scaled_by Note +#> sleep_duration 6.776333 1.4168291 Standardized (mean = 0, SD = 1) +#> age 0.000000 1.0000000 +#> gender NA NA Nonnumeric +#> emot 2.713200 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.343200 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> - Nonparametric bootstrapping 95% confidence intervals computed. +#> - The number of bootstrap samples is 2000. +#> +#> Call: +#> lm(formula = sleep_duration ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -4.2941 -0.5563 0.0063 0.6663 4.3187 +#> +#> Coefficients: +#> Estimate CI Lower CI Upper Std. Error t value Pr(>|t|) +#> (Intercept) -0.2263 -0.8315 0.3503 0.3376 -0.6703 0.50298 +#> age 0.0126 -0.0128 0.0397 0.0151 0.8384 0.40221 +#> gendermale -0.1844 -0.4484 0.0723 0.1170 -1.5759 0.11570 +#> emot 0.1150 0.0256 0.2001 0.0449 2.5600 0.01076 * +#> cons 0.1305 0.0289 0.2323 0.0452 2.8893 0.00403 ** +#> emot:cons -0.1083 -0.2005 -0.0077 0.0434 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 0.9771 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - [CI Lower, CI Upper] are bootstrap percentile confidence intervals. +#> - Std. Error are not bootstrap SEs. +``` + + + +The standardized moderation effect is -0.1083 +, and the 95% nonparametric bootstrap percentile confidence interval is +-0.2005 to +-0.0077. + +Note: As a side product, the nonparametric bootstrap confidence of the +other coefficients are also reported. They can be used for other variables that +are standardized in the same model, whether they are involved in the moderation or not. + +# Further Information + +Further information on the functions can be found in their help pages +(`std_selected()` and `std_selected_boot()`). For example, parallel computation +can be used when doing bootstrapping, if the number of bootstrapping samples +requested is large. + +# Reference + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + +Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized +regression coefficients. *Psychometrika, 76*(4), 670-690. https://doi.org/10.1007/s11336-011-9224-6 diff --git a/vignettes/std_selected.Rmd.original b/vignettes/std_selected.Rmd.original new file mode 100644 index 0000000..6d11eac --- /dev/null +++ b/vignettes/std_selected.Rmd.original @@ -0,0 +1,327 @@ +--- +title: "Mean Center and Standardize Selected Variable by std_selected()" +author: "Shu Fai Cheung and David Weng Ngai Vong" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Mean Center and Standardize Selected Variable by std_selected()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.width = 6, + fig.height = 4, + fig.align = "center", + fig.path = "" +) +``` + +# Purpose + +Instead of standardizing all variables, even variables that (a) are categorical +and should not be standardized, or (b) measured on meaningful unites and do +not need to be standardized, `std_selected()` from the package +`stdmod` allows users to have more +control on how standardization is to be conducted. + +A moderated regression model is used as an example but it can also be used for +regression models without interaction terms. + +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + +# Setup the Environment + +```{r setup} +library(stdmod) +``` + +# Load the Dataset + +```{r load_dataset} +data(sleep_emo_con) +head(sleep_emo_con, 3) +``` + +This data set has 500 cases of data. The variables are sleep duration, +age, gender, and the scores from two personality scales, emotional stability +and conscientiousness of the IPIP Big Five markers. Please refer to +(citation to be included) for the detail of the data set. + +The names of some variables are shortened for readability: + +```{r shorten_names} +colnames(sleep_emo_con)[3:4] <- c("cons", "emot") +head(sleep_emo_con, 3) +``` + +# Moderated Regression + +Suppose we are interested in predicting sleep duration by emotional +stability, after controlling for gender and age. However, we suspect that the +effect of emotional stability, if any, may be moderated by conscientiousness. +Therefore, we conduct a moderated regression as follow: + +```{r mod_reg} +lm_raw <- lm(sleep_duration ~ age + gender + emot * cons, + data = sleep_emo_con) +summary(lm_raw) +``` + +The results show that conscientiousness significantly moderates the effect of +emotional stability on sleep duration. + +This package has a simple function, `plotmod()`, for generating a typical plot +of the moderation effect: + +```{r std_selected_lm_raw_plot} +plotmod(lm_raw, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +The function `plotmod()` also prints the conditional effects of the predictor, +emotional stability in this example. + +# Mean Center the Moderator + +To know the effect of emotional stability when conscientiousness is equal to its +mean, we can center conscientiousness by its mean in the data and redo the +moderated regression. Instead of creating the new variable and rerun the +regression, we can pass the `lm()` output to `std_selected()` and specify the +variables to be mean centered: + + +```{r lm_w_centered} +lm_w_centered <- std_selected(lm_raw, + to_center = ~ cons) +printCoefmat(summary(lm_w_centered)$coefficients, digits = 3) +``` + +The argument for meaning centering is `to_center`. The variable is specified +in the formula form, placing them on the right hand side of the formula. + +In this example, when conscientiousness +is at mean level, the effect of emotional stability is +`r formatC(coef(lm_w_centered)["emot"], 4, format = "f")`. + +# Mean Center The Moderator and the Focal Variable + +This example demonstrates centering more than one variable. In the following +model, both emotional stability and conscientiousness are centered. They are +placed after `~` and joined by `+`. + +```{r lm_xw_centered} +lm_xw_centered <- std_selected(lm_raw, + to_center = ~ emot + cons) +printCoefmat(summary(lm_xw_centered)$coefficients, digits = 3) +``` + +# Standardize The Moderator and The Focal Variable + +To standardize a variable we first mean center it and then scale it by its +standard deviation. Scaling is done by listing the variable on `to_scale`. +The input format is identical to that of `to_center`. + +```r +lm_xw_std <- std_selected(lm_raw, + to_center = ~ emot + cons, + to_scale = ~ emot + cons) +``` + +Since 0.2.6.3 of `stdmod`, `to_standardize` can be used as a shortcut. +Listing a variable on `to_standardize` is equivalent to listing it +on both `to_center` and `to_scale`. Therefore, the following +call can also be used: + +```{r lm_xw_std} +lm_xw_std <- std_selected(lm_raw, + to_standardize = ~ emot + cons) +``` + +```{r lm_xw_std_coef} +printCoefmat(summary(lm_xw_std)$coefficients, digits = 3) +``` + +In this example, when conscientiousness +is at mean level, for each one standard deviation increase of +emotional stability, the predicted sleep duration increases by +`r formatC(coef(lm_xw_std)["emot"], 4, format = "f")` hour. + +```{r std_selected_lm_xw_std_plot} +plotmod(lm_xw_std, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +The function `plotmod()` automatically checks whether a variable is +standardized. If yes, it will report this in the plot as table note on the +bottom. + +The pattern of +the plot does not change. However, the conditional effects reported +in the graph are now +based on the model with emotional stability and conscientiousness +standardized. + +# Standardize The Moderator, The Focal Variable, and the Outcome Variable + +We can also mean center or standardize the outcome variable (dependent +variable). We +just add the variable to the right hand side of `~` in `to_center` and +`to_scale` as appropriate. + +```r +lm_xwy_std <- std_selected(lm_raw, + to_center = ~ emot + cons + sleep_duration, + to_scale = ~ emot + cons + sleep_duration) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```{r lm_xwy_std} +lm_xwy_std <- std_selected(lm_raw, + to_standardize = ~ emot + cons + sleep_duration) +printCoefmat(summary(lm_xwy_std)$coefficients, digits = 3) +``` + +In this example, when conscientiousness +is at mean level, the standardized moderation effect of +emotional stability on sleep duration is +`r formatC(coef(lm_xwy_std)["emot"], 4, format = "f")`. + +```{r std_selected_lm_xwy_std_plot} +plotmod(lm_xwy_std, + x = "emot", + w = "cons", + x_label = "Emotional Stability", + w_label = "Conscientiousness", + y_label = "Sleep Duration") +``` + +Again, the pattern of +the plot does not change, but the conditional effects reported +in the graph are now +based on the model with emotional stability, conscientiousness, +and sleep duration standardized. + +# Standardize All Variables + +If we want to standardize all variables except for categorical variables, if any, +we can use `~ .` as a shortcut. `std_selected()` will automatically +skip categorical variables (i.e., factors or string variables in the +regression model of `lm()`). + +```r +lm_all_std <- std_selected(lm_raw, + to_center = ~ ., + to_scale = ~ .) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```{r lm_all_std} +lm_all_std <- std_selected(lm_raw, + to_standardize = ~ .) +printCoefmat(summary(lm_all_std)$coefficients, digits = 3) +``` + +# The Usual Standardized Solution + +For comparison, this is the results of standardizing all variables, including +the product term and the categorical variable. + +```{r lm_beta} +library(lm.beta) # For generating the typical standardized solution +packageVersion("lm.beta") +lm_usual_std <- lm.beta(lm_raw) +printCoefmat(summary(lm_usual_std)$coefficients, digits = 3) +``` + +In moderated regression, the coefficient of *standardized* product term, +`r formatC(coef(lm_usual_std)["emot:cons"], 4, format = "f")`, +is not interpretable. The coefficient of *standardized* gender, +`r formatC(coef(lm_usual_std)["gendermale"], 4, format = "f")`, is also +difficult to interpret. + +# Improved Confidence Interval For "Betas" + +It has been shown (e.g., [Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)) that the standard errors of +standardized regression coefficients (betas) computed just by standardizing +the variables +are biased, and consequently the confidence intervals are also invalid. The +function `std_selected_boot()` is a wrapper of `std_selected()` that also +forms the confidence interval of the regression coefficients when standardization +is conducted, +using nonparametric bootstrapping as suggested by +[Cheung, Cheung, Lau, Hui, and Vong (2022)](https://doi.org/10.1037/hea0001188). + +We use the same example above that standardizes emotional stability, +conscientiousness, and sleep duration, to illustrate this function. +The argument `nboot` specifies the number of nonparametric bootstrap samples. +The level of confidence set by `conf`. The default is .95, denoting 95% +confidence intervals. If this is the desired level, this argument can be omitted. + +```{r echo = FALSE, eval = FALSE} +if (file.exists("eg_lm_xwy_std_ci.rds")) { + lm_xwy_std_ci <- readRDS("eg_lm_xwy_std_ci.rds") + } else { + set.seed(58702) + lm_xwy_std_ci <- std_selected_boot(lm_raw, + to_center = ~ emot + cons + sleep_duration, + to_scale = ~ emot + cons + sleep_duration, + nboot = 2000) + saveRDS(lm_xwy_std_ci, "eg_lm_xwy_std_ci.rds", compress = "xz") + } +``` + +```{r} +set.seed(58702) +lm_xwy_std_ci <- std_selected_boot(lm_raw, + to_standardize = ~ emot + cons + sleep_duration, + nboot = 2000) +``` + +```{r lm_xwy_std_ci_summary} +summary(lm_xwy_std_ci) +``` + +```{r echo = FALSE} +tmp <- summary(lm_xwy_std_ci)$coefficients +``` + +The standardized moderation effect is `r formatC(tmp["emot:cons", "Estimate"], 4, format = "f")` +, and the 95% nonparametric bootstrap percentile confidence interval is +`r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to +`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. + +Note: As a side product, the nonparametric bootstrap confidence of the +other coefficients are also reported. They can be used for other variables that +are standardized in the same model, whether they are involved in the moderation or not. + +# Further Information + +Further information on the functions can be found in their help pages +(`std_selected()` and `std_selected_boot()`). For example, parallel computation +can be used when doing bootstrapping, if the number of bootstrapping samples +requested is large. + +# Reference + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + +Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized +regression coefficients. *Psychometrika, 76*(4), 670-690. https://doi.org/10.1007/s11336-011-9224-6 diff --git a/vignettes/std_selected_lm_raw_plot-1.png b/vignettes/std_selected_lm_raw_plot-1.png new file mode 100644 index 0000000..68f7bca Binary files /dev/null and b/vignettes/std_selected_lm_raw_plot-1.png differ diff --git a/vignettes/std_selected_lm_xw_std_plot-1.png b/vignettes/std_selected_lm_xw_std_plot-1.png new file mode 100644 index 0000000..0d992a7 Binary files /dev/null and b/vignettes/std_selected_lm_xw_std_plot-1.png differ diff --git a/vignettes/std_selected_lm_xwy_std_plot-1.png b/vignettes/std_selected_lm_xwy_std_plot-1.png new file mode 100644 index 0000000..64e5e04 Binary files /dev/null and b/vignettes/std_selected_lm_xwy_std_plot-1.png differ diff --git a/vignettes/stdmod.Rmd b/vignettes/stdmod.Rmd index 43bf284..25095ae 100644 --- a/vignettes/stdmod.Rmd +++ b/vignettes/stdmod.Rmd @@ -1,316 +1,547 @@ ---- -title: "A Quick Start Guide on Using std_selected()" -author: "Shu Fai Cheung" -date: "`r Sys.Date()`" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{A Quick Start Guide on Using std_selected()} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.width = 6, - fig.height = 4, - fig.align = "center" -) -``` - -# Introduction - -This vignette illustrates how to use -`std_selected()`, the main function from -the `stdmod` package. -More about this package can be found -in `vignette("stdmod", package = "stdmod")` -or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). - - -# This Guide Shows to use `std_selected()` to: - -- get the correct standardized regression coefficients of a moderated -regression model, and - -- form the valid confidence intervals of the standardized regression -coefficients using nonparametric bootstrapping that takes into account the -sampling variation due to standardization. - -# Sample Dataset - -```{r} -library(stdmod) -dat <- sleep_emo_con -head(dat, 3) -``` - -This dataset has 500 cases, with sleep duration -(measured in average hours), -conscientiousness, emotional stability, age, and gender (a -`"female"` and `"male"`). - -The names of some variables are shortened for readability: - -```{r} -colnames(dat)[2:4] <- c("sleep", "cons", "emot") -head(dat, 3) -``` - -# Model - -Suppose this is the moderated regression model: - -- Dependent variable (Outcome Variable): sleep duration (`sleep`) - -- Independent variable (Predictor / Focal Variable): emotional stability (`emot`) - -- Moderator: conscientiousness (`cons`) - -- Control variables: `age` and `gender` - -`lm()` can be used to fit this model: - -```{r} -lm_out <- lm(sleep ~ age + gender + emot * cons, - dat = dat) -summary(lm_out) -``` - -The unstandardized moderation effect is significant, B = -`r formatC(coef(lm_out)["emot:cons"], 4, format = "f")`. -For each one unit increase of conscientiousness score, the effect of emotional -stability decreases by `r formatC(-1 * coef(lm_out)["emot:cons"], 4, format = "f")`. - -# Correct Standardization For the Moderated Regression - -Suppose we want to find the correct standardized solution for the moderated -regression, that is, all variables -except for categorical variables are standardized. In a moderated regression model, -the product term should be formed *after* standardization. - -Instead of doing the standardization ourselves before calling `lm()`, we can pass -the `lm()` output to `std_selected()`, and use `~ .` for -the arguments `to_scale` and `to_center`. - -```r -lm_stdall <- std_selected(lm_out, - to_scale = ~ ., - to_center = ~ .) -``` - -Since 0.2.6.3, `to_standardize` can be used as a shortcut: - -```{r} -lm_stdall <- std_selected(lm_out, - to_standardize = ~ .) -``` - -```{r} -summary(lm_stdall) -``` - - - -In this example, the coefficient of the product term, which naturally can -be called the -**standardized moderation effect**, is significant, B = -`r formatC(coef(lm_stdall)["emot:cons"], 4, format = "f")`. -For each one *standard deviation* increase of conscientiousness score, the -**standardized effect** of emotional stability decreases by -`r formatC(-1 * coef(lm_stdall)["emot:cons"], 4, format = "f")`. - -## The Arguments - -Standardization is equivalent to centering by mean and then scaling by -(dividing by) standard deviation. -The argument `to_center` specifies the variables to be centered -by their means, and the argument `to_scale` specifies the variables to be scaled by -their standard deviations. The formula interface of `lm()` is used in these two -arguments, -with the variables on the right hand side being the variables to be -centered and/or scaled. - -The "`.`" on the right hand side represents all variables in the model, -including the outcome variable (sleep duration in this example). - -`std_selected()` will also skip categorical variables automatically skipped -because standardizing them will make their coefficients not easy to interpret. - -Since 0.2.6.3, `to_standardize` is added as a shortcut. Listing a variable -on `to_standardize` is equivalent to listing this variable -on both `to_center` and `to_scale`. - -## Advantage - -Using `std_selected` minimizes impact on the workflow. Do regression -as usual. Get the correct standardized coefficients only when we need to -interpret them. - -## Nonparametric Bootstrap Confidence Intervals - -There is one problem with standardized coefficients. The confidence intervals -based on ordinary least squares (OLS) fitted to the standardized -variables do not take into account the sampling variation of -the sample means and standard deviations ([Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)). -[Cheung, Cheung, Lau, Hui, and Vong (2022)](https://doi.org/10.1037/hea0001188) -suggest using nonparametric bootstrapping, with standardization conducted -in each bootstrap sample. - -This can be done by `std_selected_boot()`, a wrapper of `std_selected()`: - -```{r echo = FALSE} -if (file.exists("stdmod_lm_stdall_boot.rds")) { - lm_stdall_boot <- readRDS("stdmod_lm_stdall_boot.rds") - } else { - set.seed(870432) - lm_stdall_boot <- std_selected_boot(lm_out, - to_scale = ~ ., - to_center = ~ ., - nboot = 5000) - saveRDS(lm_stdall_boot, "stdmod_lm_stdall_boot.rds", compress = "xz") - } -``` - -```{r eval = FALSE} -set.seed(870432) -lm_stdall_boot <- std_selected_boot(lm_out, - to_scale = ~ ., - to_center = ~ ., - nboot = 5000) -``` - -Since 0.2.6.3, `to_standardize` can be used as a shortcut: - -```r -lm_stdall_boot <- std_selected_boot(lm_out, - to_standardize = ~ . - nboot = 5000) -``` - - -The minimum additional argument is `nboot`, the number of bootstrap samples. - -```{r} -summary(lm_stdall_boot) -``` - -The output is similar to that of `std_selected()`, with additional information -on the bootstrapping process. - -```{r echo = FALSE} -tmp <- summary(lm_stdall_boot)$coefficients -``` - -The 95% bootstrap percentile confidence interval of the standardized -moderation effect is `r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to -`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. - -# Standardize Independent Variable (Focal Variable) and Moderator - -`std_selected()` and `std_selected_boot()` can also be used to standardize only -selected variables. There are cases in which we do not want to standardize -some continuous variables because they are measured on interpretable units, -such as hours. - -Suppose we want to standardize only emotional stability and conscientiousness, -and do not standardize sleep -duration. We just list `emot` and `cons` on -`to_center` and `to_scale`: - -```r -lm_std1 <- std_selected(lm_out, - to_scale = ~ emot + cons, - to_center = ~ emot + cons) -``` - -Since 0.2.6.3, `to_standardize` can be used a shortuct: - -```{r} -lm_std1 <- std_selected(lm_out, - to_standardize = ~ emot + cons) -``` - -```{r} -summary(lm_std1) -``` - -The *partially* standardized moderation effect is -`r formatC(coef(lm_std1)["emot:cons"], 4, format = "f")`. -For each one *standard deviation* increase of conscientiousness score, the -*partially* standardized effect of emotional stability decreases by -`r formatC(-1 * coef(lm_std1)["emot:cons"], 4, format = "f")`. - -## Nonparametric Bootstrap Confidence Intervals - -The function `std_selected_boot()` can also be used to form the nonparametric -bootstrap confidence interval when only some of the variables are standardized: - -```{r echo = FALSE} -if (file.exists("stdmod_lm_std1_boot.rds")) { - lm_std1_boot <- readRDS("stdmod_lm_std1_boot.rds") - } else { - set.seed(870432) - lm_std1_boot <- std_selected_boot(lm_out, - to_scale = ~ emot + cons, - to_center = ~ emot + cons, - nboot = 5000) - saveRDS(lm_std1_boot, "stdmod_lm_std1_boot.rds", compress = "xz") - } -``` - -```{r eval = FALSE} -set.seed(870432) -lm_std1_boot <- std_selected_boot(lm_out, - to_scale = ~ emot + cons, - to_center = ~ emot + cons, - nboot = 5000) -``` - -Since 0.2.6.3, `to_standardize` can be used as a shortcut: - -```r -lm_std1_boot <- std_selected_boot(lm_out, - to_standardize = ~ emot + cons, - nboot = 5000) -``` - -Again, the only additional argument is `nboot`. - -```{r} -summary(lm_std1_boot) -``` - -```{r echo = FALSE} -tmp <- summary(lm_std1_boot)$coefficients -``` - -The 95% bootstrap percentile confidence interval of the partially standardized -moderation effect is `r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to -`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. - -# Further Information - -A more detailed illustration can be found at -`vignette("moderation", package = "stdmod")`. - -`vignette("std_selected", package = "stdmod")` illustrates how `std_selected()` can be used -to form nonparametric bootstrap percentile confidence interval for -standardized regression coefficients ("betas") for regression models -without a product term. - -Further information on the functions can be found in their help pages -(`std_selected()` and `std_selected_boot()`). For example, parallel computation -can be used when doing bootstrapping, if the number of bootstrapping samples -request is large. - -# Reference(s) - -Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) -Improving an old way to measure moderation effect in standardized units. -*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. - -Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized -regression coefficients. *Psychometrika, 76*(4), 670-690. -https://doi.org/10.1007/s11336-011-9224-6 \ No newline at end of file +--- +title: "A Quick Start Guide on Using std_selected()" +author: "Shu Fai Cheung" +date: "2026-01-04" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{A Quick Start Guide on Using std_selected()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + +# Introduction + +This vignette illustrates how to use +`std_selected()`, the main function from +the `stdmod` package. +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + + +# This Guide Shows to use `std_selected()` to: + +- get the correct standardized regression coefficients of a moderated +regression model, and + +- form the valid confidence intervals of the standardized regression +coefficients using nonparametric bootstrapping that takes into account the +sampling variation due to standardization. + +# Sample Dataset + + +``` r +library(stdmod) +dat <- sleep_emo_con +head(dat, 3) +#> # A tibble: 3 × 6 +#> case_id sleep_duration cons emot age gender +#> +#> 1 1 6 3.6 3.6 20 female +#> 2 2 4 3.8 2.4 20 female +#> 3 3 7 4.3 2.7 20 female +``` + +This dataset has 500 cases, with sleep duration +(measured in average hours), +conscientiousness, emotional stability, age, and gender (a +`"female"` and `"male"`). + +The names of some variables are shortened for readability: + + +``` r +colnames(dat)[2:4] <- c("sleep", "cons", "emot") +head(dat, 3) +#> # A tibble: 3 × 6 +#> case_id sleep cons emot age gender +#> +#> 1 1 6 3.6 3.6 20 female +#> 2 2 4 3.8 2.4 20 female +#> 3 3 7 4.3 2.7 20 female +``` + +# Model + +Suppose this is the moderated regression model: + +- Dependent variable (Outcome Variable): sleep duration (`sleep`) + +- Independent variable (Predictor / Focal Variable): emotional stability (`emot`) + +- Moderator: conscientiousness (`cons`) + +- Control variables: `age` and `gender` + +`lm()` can be used to fit this model: + + +``` r +lm_out <- lm(sleep ~ age + gender + emot * cons, + dat = dat) +summary(lm_out) +#> +#> Call: +#> lm(formula = sleep ~ age + gender + emot * cons, data = dat) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -6.0841 -0.7882 0.0089 0.9440 6.1189 +#> +#> Coefficients: +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 1.85154 1.35224 1.369 0.17155 +#> age 0.01789 0.02133 0.838 0.40221 +#> gendermale -0.26127 0.16579 -1.576 0.11570 +#> emot 1.32151 0.45039 2.934 0.00350 ** +#> cons 1.20385 0.37062 3.248 0.00124 ** +#> emot:cons -0.33140 0.13273 -2.497 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 1.384 on 494 degrees of freedom +#> Multiple R-squared: 0.0548, Adjusted R-squared: 0.04523 +#> F-statistic: 5.728 on 5 and 494 DF, p-value: 3.768e-05 +``` + +The unstandardized moderation effect is significant, B = +-0.3314. +For each one unit increase of conscientiousness score, the effect of emotional +stability decreases by 0.3314. + +# Correct Standardization For the Moderated Regression + +Suppose we want to find the correct standardized solution for the moderated +regression, that is, all variables +except for categorical variables are standardized. In a moderated regression model, +the product term should be formed *after* standardization. + +Instead of doing the standardization ourselves before calling `lm()`, we can pass +the `lm()` output to `std_selected()`, and use `~ .` for +the arguments `to_scale` and `to_center`. + +```r +lm_stdall <- std_selected(lm_out, + to_scale = ~ ., + to_center = ~ .) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + + +``` r +lm_stdall <- std_selected(lm_out, + to_standardize = ~ .) +``` + + +``` r +summary(lm_stdall) +#> +#> Call to std_selected(): +#> std_selected(lm_out = lm_out, to_standardize = ~.) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: sleep age gender emot cons +#> - Variable(s) scaled: sleep age gender emot cons +#> +#> centered_by scaled_by Note +#> sleep 6.776333 1.4168291 Standardized (mean = 0, SD = 1) +#> age 22.274000 2.9407857 Standardized (mean = 0, SD = 1) +#> gender NA NA Nonnumeric +#> emot 2.713200 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.343200 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> +#> Call: +#> lm(formula = sleep ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -4.2941 -0.5563 0.0063 0.6663 4.3187 +#> +#> Coefficients: +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 0.0549 0.0488 1.1248 0.26124 +#> age 0.0371 0.0443 0.8384 0.40221 +#> gendermale -0.1844 0.1170 -1.5759 0.11570 +#> emot 0.1150 0.0449 2.5600 0.01076 * +#> cons 0.1305 0.0452 2.8893 0.00403 ** +#> emot:cons -0.1083 0.0434 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 0.9771 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - One or more variables are scaled by SD or standardized. OLS standard +#> errors and confidence intervals may be biased for their coefficients. +#> Please use `std_selected_boot()`. +``` + + + +In this example, the coefficient of the product term, which naturally can +be called the +**standardized moderation effect**, is significant, B = +-0.1083. +For each one *standard deviation* increase of conscientiousness score, the +**standardized effect** of emotional stability decreases by +0.1083. + +## The Arguments + +Standardization is equivalent to centering by mean and then scaling by +(dividing by) standard deviation. +The argument `to_center` specifies the variables to be centered +by their means, and the argument `to_scale` specifies the variables to be scaled by +their standard deviations. The formula interface of `lm()` is used in these two +arguments, +with the variables on the right hand side being the variables to be +centered and/or scaled. + +The "`.`" on the right hand side represents all variables in the model, +including the outcome variable (sleep duration in this example). + +`std_selected()` will also skip categorical variables automatically skipped +because standardizing them will make their coefficients not easy to interpret. + +Since 0.2.6.3, `to_standardize` is added as a shortcut. Listing a variable +on `to_standardize` is equivalent to listing this variable +on both `to_center` and `to_scale`. + +## Advantage + +Using `std_selected` minimizes impact on the workflow. Do regression +as usual. Get the correct standardized coefficients only when we need to +interpret them. + +## Nonparametric Bootstrap Confidence Intervals + +There is one problem with standardized coefficients. The confidence intervals +based on ordinary least squares (OLS) fitted to the standardized +variables do not take into account the sampling variation of +the sample means and standard deviations ([Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)). +[Cheung, Cheung, Lau, Hui, and Vong (2022)](https://doi.org/10.1037/hea0001188) +suggest using nonparametric bootstrapping, with standardization conducted +in each bootstrap sample. + +This can be done by `std_selected_boot()`, a wrapper of `std_selected()`: + + + + +``` r +set.seed(870432) +lm_stdall_boot <- std_selected_boot(lm_out, + to_scale = ~ ., + to_center = ~ ., + nboot = 5000) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```r +lm_stdall_boot <- std_selected_boot(lm_out, + to_standardize = ~ . + nboot = 5000) +``` + + +The minimum additional argument is `nboot`, the number of bootstrap samples. + + +``` r +summary(lm_stdall_boot) +#> +#> Call to std_selected_boot(): +#> std_selected_boot(lm_out = lm_out, to_scale = ~., to_center = ~., +#> nboot = 5000) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: sleep age gender emot cons +#> - Variable(s) scaled: sleep age gender emot cons +#> +#> centered_by scaled_by Note +#> sleep 6.776333 1.4168291 Standardized (mean = 0, SD = 1) +#> age 22.274000 2.9407857 Standardized (mean = 0, SD = 1) +#> gender NA NA Nonnumeric +#> emot 2.713200 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.343200 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> - Nonparametric bootstrapping 95% confidence intervals computed. +#> - The number of bootstrap samples is 5000. +#> +#> Call: +#> lm(formula = sleep ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -4.2941 -0.5563 0.0063 0.6663 4.3187 +#> +#> Coefficients: +#> Estimate CI Lower CI Upper Std. Error t value Pr(>|t|) +#> (Intercept) 0.0549 0.0072 0.1045 0.0488 1.1248 0.26124 +#> age 0.0371 -0.0347 0.1072 0.0443 0.8384 0.40221 +#> gendermale -0.1844 -0.4392 0.0783 0.1170 -1.5759 0.11570 +#> emot 0.1150 0.0291 0.2012 0.0449 2.5600 0.01076 * +#> cons 0.1305 0.0288 0.2265 0.0452 2.8893 0.00403 ** +#> emot:cons -0.1083 -0.2043 -0.0090 0.0434 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 0.9771 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - [CI Lower, CI Upper] are bootstrap percentile confidence intervals. +#> - Std. Error are not bootstrap SEs. +``` + +The output is similar to that of `std_selected()`, with additional information +on the bootstrapping process. + + + +The 95% bootstrap percentile confidence interval of the standardized +moderation effect is -0.2043 to +-0.0090. + +# Standardize Independent Variable (Focal Variable) and Moderator + +`std_selected()` and `std_selected_boot()` can also be used to standardize only +selected variables. There are cases in which we do not want to standardize +some continuous variables because they are measured on interpretable units, +such as hours. + +Suppose we want to standardize only emotional stability and conscientiousness, +and do not standardize sleep +duration. We just list `emot` and `cons` on +`to_center` and `to_scale`: + +```r +lm_std1 <- std_selected(lm_out, + to_scale = ~ emot + cons, + to_center = ~ emot + cons) +``` + +Since 0.2.6.3, `to_standardize` can be used a shortuct: + + +``` r +lm_std1 <- std_selected(lm_out, + to_standardize = ~ emot + cons) +``` + + +``` r +summary(lm_std1) +#> +#> Call to std_selected(): +#> std_selected(lm_out = lm_out, to_standardize = ~emot + cons) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: emot cons +#> - Variable(s) scaled: emot cons +#> +#> centered_by scaled_by Note +#> sleep 0.0000 1.0000000 +#> age 0.0000 1.0000000 +#> gender NA NA Nonnumeric +#> emot 2.7132 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.3432 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> +#> Call: +#> lm(formula = sleep ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -6.0841 -0.7882 0.0089 0.9440 6.1189 +#> +#> Coefficients: +#> Estimate Std. Error t value Pr(>|t|) +#> (Intercept) 6.4557 0.4783 13.4979 < 0.001 *** +#> age 0.0179 0.0213 0.8384 0.40221 +#> gendermale -0.2613 0.1658 -1.5759 0.11570 +#> emot 0.1630 0.0637 2.5600 0.01076 * +#> cons 0.1849 0.0640 2.8893 0.00403 ** +#> emot:cons -0.1534 0.0615 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 1.384 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - One or more variables are scaled by SD or standardized. OLS standard +#> errors and confidence intervals may be biased for their coefficients. +#> Please use `std_selected_boot()`. +``` + +The *partially* standardized moderation effect is +-0.1534. +For each one *standard deviation* increase of conscientiousness score, the +*partially* standardized effect of emotional stability decreases by +0.1534. + +## Nonparametric Bootstrap Confidence Intervals + +The function `std_selected_boot()` can also be used to form the nonparametric +bootstrap confidence interval when only some of the variables are standardized: + + + + +``` r +set.seed(870432) +lm_std1_boot <- std_selected_boot(lm_out, + to_scale = ~ emot + cons, + to_center = ~ emot + cons, + nboot = 5000) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```r +lm_std1_boot <- std_selected_boot(lm_out, + to_standardize = ~ emot + cons, + nboot = 5000) +``` + +Again, the only additional argument is `nboot`. + + +``` r +summary(lm_std1_boot) +#> +#> Call to std_selected_boot(): +#> std_selected_boot(lm_out = lm_out, to_scale = ~emot + cons, to_center = ~emot + +#> cons, nboot = 5000) +#> +#> Selected variable(s) are centered by mean and/or scaled by SD +#> - Variable(s) centered: emot cons +#> - Variable(s) scaled: emot cons +#> +#> centered_by scaled_by Note +#> sleep 0.0000 1.0000000 +#> age 0.0000 1.0000000 +#> gender NA NA Nonnumeric +#> emot 2.7132 0.7629613 Standardized (mean = 0, SD = 1) +#> cons 3.3432 0.6068198 Standardized (mean = 0, SD = 1) +#> +#> Note: +#> - Categorical variables will not be centered or scaled even if +#> requested. +#> - Nonparametric bootstrapping 95% confidence intervals computed. +#> - The number of bootstrap samples is 5000. +#> +#> Call: +#> lm(formula = sleep ~ age + gender + emot * cons, data = dat_mod) +#> +#> Residuals: +#> Min 1Q Median 3Q Max +#> -6.0841 -0.7882 0.0089 0.9440 6.1189 +#> +#> Coefficients: +#> Estimate CI Lower CI Upper Std. Error t value Pr(>|t|) +#> (Intercept) 6.4557 5.6487 7.2735 0.4783 13.4979 < 0.001 *** +#> age 0.0179 -0.0184 0.0544 0.0213 0.8384 0.40221 +#> gendermale -0.2613 -0.6233 0.1105 0.1658 -1.5759 0.11570 +#> emot 0.1630 0.0405 0.2893 0.0637 2.5600 0.01076 * +#> cons 0.1849 0.0415 0.3229 0.0640 2.8893 0.00403 ** +#> emot:cons -0.1534 -0.2915 -0.0124 0.0615 -2.4967 0.01286 * +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> +#> Residual standard error: 1.384 on 494 degrees of freedom +#> +#> R-squared : 0.0548 +#> Adjusted R-squared : 0.0452 +#> ANOVA test of R-squared : F(5, 494) = 5.7277, p < 0.001 +#> +#> = Test the highest order term = +#> The highest order term : emot:cons +#> R-squared increase adding this term: 0.0119 +#> F test of R-squared increase : F(1, 494) = 6.2335, p = 0.013 +#> +#> Note: +#> - Estimates and their statistics are based on the data after +#> mean-centering, scaling, or standardization. +#> - [CI Lower, CI Upper] are bootstrap percentile confidence intervals. +#> - Std. Error are not bootstrap SEs. +``` + + + +The 95% bootstrap percentile confidence interval of the partially standardized +moderation effect is -0.2915 to +-0.0124. + +# Further Information + +A more detailed illustration can be found at +`vignette("moderation", package = "stdmod")`. + +`vignette("std_selected", package = "stdmod")` illustrates how `std_selected()` can be used +to form nonparametric bootstrap percentile confidence interval for +standardized regression coefficients ("betas") for regression models +without a product term. + +Further information on the functions can be found in their help pages +(`std_selected()` and `std_selected_boot()`). For example, parallel computation +can be used when doing bootstrapping, if the number of bootstrapping samples +request is large. + +# Reference(s) + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + +Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized +regression coefficients. *Psychometrika, 76*(4), 670-690. +https://doi.org/10.1007/s11336-011-9224-6 diff --git a/vignettes/stdmod.Rmd.original b/vignettes/stdmod.Rmd.original new file mode 100644 index 0000000..0fbf805 --- /dev/null +++ b/vignettes/stdmod.Rmd.original @@ -0,0 +1,317 @@ +--- +title: "A Quick Start Guide on Using std_selected()" +author: "Shu Fai Cheung" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{A Quick Start Guide on Using std_selected()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.width = 6, + fig.height = 4, + fig.align = "center", + fig.path = "" +) +``` + +# Introduction + +This vignette illustrates how to use +`std_selected()`, the main function from +the `stdmod` package. +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + + +# This Guide Shows to use `std_selected()` to: + +- get the correct standardized regression coefficients of a moderated +regression model, and + +- form the valid confidence intervals of the standardized regression +coefficients using nonparametric bootstrapping that takes into account the +sampling variation due to standardization. + +# Sample Dataset + +```{r} +library(stdmod) +dat <- sleep_emo_con +head(dat, 3) +``` + +This dataset has 500 cases, with sleep duration +(measured in average hours), +conscientiousness, emotional stability, age, and gender (a +`"female"` and `"male"`). + +The names of some variables are shortened for readability: + +```{r} +colnames(dat)[2:4] <- c("sleep", "cons", "emot") +head(dat, 3) +``` + +# Model + +Suppose this is the moderated regression model: + +- Dependent variable (Outcome Variable): sleep duration (`sleep`) + +- Independent variable (Predictor / Focal Variable): emotional stability (`emot`) + +- Moderator: conscientiousness (`cons`) + +- Control variables: `age` and `gender` + +`lm()` can be used to fit this model: + +```{r} +lm_out <- lm(sleep ~ age + gender + emot * cons, + dat = dat) +summary(lm_out) +``` + +The unstandardized moderation effect is significant, B = +`r formatC(coef(lm_out)["emot:cons"], 4, format = "f")`. +For each one unit increase of conscientiousness score, the effect of emotional +stability decreases by `r formatC(-1 * coef(lm_out)["emot:cons"], 4, format = "f")`. + +# Correct Standardization For the Moderated Regression + +Suppose we want to find the correct standardized solution for the moderated +regression, that is, all variables +except for categorical variables are standardized. In a moderated regression model, +the product term should be formed *after* standardization. + +Instead of doing the standardization ourselves before calling `lm()`, we can pass +the `lm()` output to `std_selected()`, and use `~ .` for +the arguments `to_scale` and `to_center`. + +```r +lm_stdall <- std_selected(lm_out, + to_scale = ~ ., + to_center = ~ .) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```{r} +lm_stdall <- std_selected(lm_out, + to_standardize = ~ .) +``` + +```{r} +summary(lm_stdall) +``` + + + +In this example, the coefficient of the product term, which naturally can +be called the +**standardized moderation effect**, is significant, B = +`r formatC(coef(lm_stdall)["emot:cons"], 4, format = "f")`. +For each one *standard deviation* increase of conscientiousness score, the +**standardized effect** of emotional stability decreases by +`r formatC(-1 * coef(lm_stdall)["emot:cons"], 4, format = "f")`. + +## The Arguments + +Standardization is equivalent to centering by mean and then scaling by +(dividing by) standard deviation. +The argument `to_center` specifies the variables to be centered +by their means, and the argument `to_scale` specifies the variables to be scaled by +their standard deviations. The formula interface of `lm()` is used in these two +arguments, +with the variables on the right hand side being the variables to be +centered and/or scaled. + +The "`.`" on the right hand side represents all variables in the model, +including the outcome variable (sleep duration in this example). + +`std_selected()` will also skip categorical variables automatically skipped +because standardizing them will make their coefficients not easy to interpret. + +Since 0.2.6.3, `to_standardize` is added as a shortcut. Listing a variable +on `to_standardize` is equivalent to listing this variable +on both `to_center` and `to_scale`. + +## Advantage + +Using `std_selected` minimizes impact on the workflow. Do regression +as usual. Get the correct standardized coefficients only when we need to +interpret them. + +## Nonparametric Bootstrap Confidence Intervals + +There is one problem with standardized coefficients. The confidence intervals +based on ordinary least squares (OLS) fitted to the standardized +variables do not take into account the sampling variation of +the sample means and standard deviations ([Yuan & Chan, 2011](https://doi.org/10.1007/s11336-011-9224-6)). +[Cheung, Cheung, Lau, Hui, and Vong (2022)](https://doi.org/10.1037/hea0001188) +suggest using nonparametric bootstrapping, with standardization conducted +in each bootstrap sample. + +This can be done by `std_selected_boot()`, a wrapper of `std_selected()`: + +```{r echo = FALSE, eval = FALSE} +if (file.exists("stdmod_lm_stdall_boot.rds")) { + lm_stdall_boot <- readRDS("stdmod_lm_stdall_boot.rds") + } else { + set.seed(870432) + lm_stdall_boot <- std_selected_boot(lm_out, + to_scale = ~ ., + to_center = ~ ., + nboot = 5000) + saveRDS(lm_stdall_boot, "stdmod_lm_stdall_boot.rds", compress = "xz") + } +``` + +```{r eval = TRUE} +set.seed(870432) +lm_stdall_boot <- std_selected_boot(lm_out, + to_scale = ~ ., + to_center = ~ ., + nboot = 5000) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```r +lm_stdall_boot <- std_selected_boot(lm_out, + to_standardize = ~ . + nboot = 5000) +``` + + +The minimum additional argument is `nboot`, the number of bootstrap samples. + +```{r} +summary(lm_stdall_boot) +``` + +The output is similar to that of `std_selected()`, with additional information +on the bootstrapping process. + +```{r echo = FALSE} +tmp <- summary(lm_stdall_boot)$coefficients +``` + +The 95% bootstrap percentile confidence interval of the standardized +moderation effect is `r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to +`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. + +# Standardize Independent Variable (Focal Variable) and Moderator + +`std_selected()` and `std_selected_boot()` can also be used to standardize only +selected variables. There are cases in which we do not want to standardize +some continuous variables because they are measured on interpretable units, +such as hours. + +Suppose we want to standardize only emotional stability and conscientiousness, +and do not standardize sleep +duration. We just list `emot` and `cons` on +`to_center` and `to_scale`: + +```r +lm_std1 <- std_selected(lm_out, + to_scale = ~ emot + cons, + to_center = ~ emot + cons) +``` + +Since 0.2.6.3, `to_standardize` can be used a shortuct: + +```{r} +lm_std1 <- std_selected(lm_out, + to_standardize = ~ emot + cons) +``` + +```{r} +summary(lm_std1) +``` + +The *partially* standardized moderation effect is +`r formatC(coef(lm_std1)["emot:cons"], 4, format = "f")`. +For each one *standard deviation* increase of conscientiousness score, the +*partially* standardized effect of emotional stability decreases by +`r formatC(-1 * coef(lm_std1)["emot:cons"], 4, format = "f")`. + +## Nonparametric Bootstrap Confidence Intervals + +The function `std_selected_boot()` can also be used to form the nonparametric +bootstrap confidence interval when only some of the variables are standardized: + +```{r echo = FALSE, eval = FALSE} +if (file.exists("stdmod_lm_std1_boot.rds")) { + lm_std1_boot <- readRDS("stdmod_lm_std1_boot.rds") + } else { + set.seed(870432) + lm_std1_boot <- std_selected_boot(lm_out, + to_scale = ~ emot + cons, + to_center = ~ emot + cons, + nboot = 5000) + saveRDS(lm_std1_boot, "stdmod_lm_std1_boot.rds", compress = "xz") + } +``` + +```{r eval = TRUE} +set.seed(870432) +lm_std1_boot <- std_selected_boot(lm_out, + to_scale = ~ emot + cons, + to_center = ~ emot + cons, + nboot = 5000) +``` + +Since 0.2.6.3, `to_standardize` can be used as a shortcut: + +```r +lm_std1_boot <- std_selected_boot(lm_out, + to_standardize = ~ emot + cons, + nboot = 5000) +``` + +Again, the only additional argument is `nboot`. + +```{r} +summary(lm_std1_boot) +``` + +```{r echo = FALSE} +tmp <- summary(lm_std1_boot)$coefficients +``` + +The 95% bootstrap percentile confidence interval of the partially standardized +moderation effect is `r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")` to +`r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")`. + +# Further Information + +A more detailed illustration can be found at +`vignette("moderation", package = "stdmod")`. + +`vignette("std_selected", package = "stdmod")` illustrates how `std_selected()` can be used +to form nonparametric bootstrap percentile confidence interval for +standardized regression coefficients ("betas") for regression models +without a product term. + +Further information on the functions can be found in their help pages +(`std_selected()` and `std_selected_boot()`). For example, parallel computation +can be used when doing bootstrapping, if the number of bootstrapping samples +request is large. + +# Reference(s) + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + +Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized +regression coefficients. *Psychometrika, 76*(4), 670-690. +https://doi.org/10.1007/s11336-011-9224-6 \ No newline at end of file diff --git a/vignettes/stdmod_lavaan.Rmd b/vignettes/stdmod_lavaan.Rmd index f80b553..cff2183 100644 --- a/vignettes/stdmod_lavaan.Rmd +++ b/vignettes/stdmod_lavaan.Rmd @@ -1,182 +1,272 @@ ---- -title: "Standardized Moderation Effect in a Path Model by stdmod_lavaan()" -author: "Shu Fai Cheung and David Weng Ngai Vong" -date: "`r Sys.Date()`" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{Standardized Moderation Effect in a Path Model by stdmod_lavaan()} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.width = 6, - fig.height = 4, - fig.align = "center" -) -``` - -# Purpose - -This document demonstrates how to use `stdmod_lavaan()` from -the package `stdmod` to compute the -standardized moderation effect in a path model fitted by `lavaan::sem()`. - -More about this package can be found -in `vignette("stdmod", package = "stdmod")` -or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). - -# Setup the Environment - -```{r setup} -library(stdmod) # For computing the standardized moderation effect conveniently -library(lavaan) # For doing path analysis in lavaan. -``` - -# Load the Dataset - -```{r load_dataset} -data(test_mod1) -round(head(test_mod1, 3), 3) -``` - -This test data set has 300 cases, six variables, all continuous. - - -# Fit the Model by `lavaan::sem()` - -The product term can be formed manually or by the colon operator, `:`. -`stdmod_lavaan()` will work in both cases. - -This is the model to be tested: - -```{r mod_sem} -mod <- -" -med ~ iv + mod + iv:mod + cov1 -dv ~ med + cov2 -" -fit <- sem(mod, test_mod1, fixed.x = FALSE) -summary(fit) -``` - -The results show that `mod` significantly moderates the effect of -`iv` on `med`. - -# Compute the Standardized Moderation Effect - -As in the case of regression, the coefficient of `iv:mod` in -the standardized solution is not the desired standardized coefficient because -it standardizes the product term. - -```{r} -standardizedSolution(fit)[3, ] -``` - -After fitting the path model by `lavaan::lavaan()`, we can use `stdmod_lavaan()` -to compute the standardized moderation effect using the standard deviations -of the focal variable, the moderator, and the outcome variable -[(Cheung, Cheung, Lau, Hui, & Vong, 2022)](https://doi.org/10.1037/hea0001188). - -The minimal arguments are: - -- `fit`: The output from `lavaan::lavaan()` and its wrappers, such - as `lavaan::sem()`. -- `x`: The focal variable, the variable with its effect on the - outcome variable being moderated. -- `y`: The outcome variable. -- `w`: The moderator. -- `x_w`: The product term. - -```{r} -fit_iv_mod_std <- stdmod_lavaan(fit = fit, - x = "iv", - y = "med", - w = "mod", - x_w = "iv:mod") -fit_iv_mod_std -``` - -The standardized moderation effect of `mod` on the `iv`-`med` path is -`r formatC(coef(fit_iv_mod_std), 3, format = "f")`. - -# Form Bootstrap Confidence Interval - -`stdmod_lavaan()` can also be used to form nonparametric bootstrap -confidence interval for the standardized moderation effect. - -There are two approaches to do this. First, if bootstrap -confidence intervals was requested when fitting the model, -the stored bootstrap estimates will be used. This is -efficient because there is no need to do bootstrapping -again. - -We fit the model again, with bootstrapping: - -```{r echo = FALSE} -if (file.exists("egl_lavaan_boot.rds")) { - fit <- readRDS("egl_lavaan_boot.rds") - } else { - fit <- sem(mod, test_mod1, fixed.x = FALSE, - se = "boot", - bootstrap = 2000, - iseed = 987543) - saveRDS(fit, "egl_lavaan_boot.rds") - } -``` - -```{r eval = FALSE} -fit <- sem(mod, test_mod1, fixed.x = FALSE, - se = "boot", - bootstrap = 2000, - iseed = 987543) -``` - -If bootstrapping has been done when fitting the model, -just adding `boot_ci = TRUE` is enough to request -nonparametric percentile bootstrap confidence interval: - -```{r} -fit_iv_mod_std_ci <- stdmod_lavaan(fit = fit, - x = "iv", - y = "med", - w = "mod", - x_w = "iv:mod", - boot_ci = TRUE) -fit_iv_mod_std_ci -``` - -The 95% confidence interval of the standardized moderation effect is -`r formatC(confint(fit_iv_mod_std_ci)[1], 3, format = "f")` to -`r formatC(confint(fit_iv_mod_std_ci)[2], 3, format = "f")`. - -The second approach, not covered here, uses -[`do_boot()`](https://sfcheung.github.io/manymome/articles/do_boot.html) -from -the [`manymome`](https://sfcheung.github.io/manymome/index.html) package. -to generate bootstrap estimates. To use the stored bootstrap -estimates, set `boot_out` to the output of `do_boot()`. -The stored bootstrap estimates will then be used. This method -can be used when non-bootstrapping confidence intervals are -needed when fitting the model. - -# Remarks - -The function `stdmod_lavaan()` can be used for more complicated path models. -The computation of the standardized moderation effect in a path model depends -only on the standard deviations of the three variables involved -(`x`, `w`, and `y`). - -# Reference(s) - -The computation of the standardized moderation effect is based on the simple -formula presented in the following manuscript, using the standard deviations of -the outcome variable, focal variable, and the moderator: - -Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) -Improving an old way to measure moderation effect in standardized units. -*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. - +--- +title: "Standardized Moderation Effect in a Path Model by stdmod_lavaan()" +author: "Shu Fai Cheung and David Weng Ngai Vong" +date: "2026-01-04" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Standardized Moderation Effect in a Path Model by stdmod_lavaan()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + +# Purpose + +This document demonstrates how to use `stdmod_lavaan()` from +the package `stdmod` to compute the +standardized moderation effect in a path model fitted by `lavaan::sem()`. + +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + +# Setup the Environment + + +``` r +library(stdmod) # For computing the standardized moderation effect conveniently +library(lavaan) # For doing path analysis in lavaan. +``` + +# Load the Dataset + + +``` r +data(test_mod1) +round(head(test_mod1, 3), 3) +#> dv iv mod med cov1 cov2 +#> 1 23.879 -0.133 -0.544 10.310 -0.511 -0.574 +#> 2 23.096 1.456 1.539 11.384 0.094 -0.264 +#> 3 23.201 0.319 1.774 9.615 -0.172 0.488 +``` + +This test data set has 300 cases, six variables, all continuous. + + +# Fit the Model by `lavaan::sem()` + +The product term can be formed manually or by the colon operator, `:`. +`stdmod_lavaan()` will work in both cases. + +This is the model to be tested: + + +``` r +mod <- +" +med ~ iv + mod + iv:mod + cov1 +dv ~ med + cov2 +" +fit <- sem(mod, test_mod1, fixed.x = FALSE) +summary(fit) +#> lavaan 0.6-21.2434 ended normally after 1 iteration +#> +#> Estimator ML +#> Optimization method NLMINB +#> Number of model parameters 23 +#> +#> Number of observations 300 +#> +#> Model Test User Model: +#> +#> Test statistic 1.058 +#> Degrees of freedom 5 +#> P-value (Chi-square) 0.958 +#> +#> Parameter Estimates: +#> +#> Standard errors Standard +#> Information Expected +#> Information saturated (h1) model Structured +#> +#> Regressions: +#> Estimate Std.Err z-value P(>|z|) +#> med ~ +#> iv 0.221 0.030 7.264 0.000 +#> mod 0.104 0.030 3.489 0.000 +#> iv:mod 0.257 0.025 10.169 0.000 +#> cov1 0.104 0.025 4.099 0.000 +#> dv ~ +#> med 0.246 0.041 5.962 0.000 +#> cov2 0.191 0.023 8.324 0.000 +#> +#> Covariances: +#> Estimate Std.Err z-value P(>|z|) +#> iv ~~ +#> mod 0.481 0.063 7.606 0.000 +#> iv:mod -0.149 0.059 -2.501 0.012 +#> cov1 -0.033 0.058 -0.575 0.565 +#> cov2 -0.071 0.059 -1.216 0.224 +#> mod ~~ +#> iv:mod -0.180 0.062 -2.923 0.003 +#> cov1 -0.060 0.059 -1.010 0.313 +#> cov2 -0.107 0.061 -1.763 0.078 +#> iv:mod ~~ +#> cov1 -0.051 0.061 -0.837 0.403 +#> cov2 0.063 0.063 1.001 0.317 +#> cov1 ~~ +#> cov2 0.071 0.061 1.158 0.247 +#> +#> Variances: +#> Estimate Std.Err z-value P(>|z|) +#> .med 0.201 0.016 12.247 0.000 +#> .dv 0.169 0.014 12.247 0.000 +#> iv 0.954 0.078 12.247 0.000 +#> mod 1.017 0.083 12.247 0.000 +#> iv:mod 1.088 0.089 12.247 0.000 +#> cov1 1.039 0.085 12.247 0.000 +#> cov2 1.076 0.088 12.247 0.000 +``` + +The results show that `mod` significantly moderates the effect of +`iv` on `med`. + +# Compute the Standardized Moderation Effect + +As in the case of regression, the coefficient of `iv:mod` in +the standardized solution is not the desired standardized coefficient because +it standardizes the product term. + + +``` r +standardizedSolution(fit)[3, ] +#> lhs op rhs est.std se z pvalue ci.lower ci.upper +#> 3 med ~ iv:mod 0.466 0.043 10.842 0 0.382 0.55 +``` + +After fitting the path model by `lavaan::lavaan()`, we can use `stdmod_lavaan()` +to compute the standardized moderation effect using the standard deviations +of the focal variable, the moderator, and the outcome variable +[(Cheung, Cheung, Lau, Hui, & Vong, 2022)](https://doi.org/10.1037/hea0001188). + +The minimal arguments are: + +- `fit`: The output from `lavaan::lavaan()` and its wrappers, such + as `lavaan::sem()`. +- `x`: The focal variable, the variable with its effect on the + outcome variable being moderated. +- `y`: The outcome variable. +- `w`: The moderator. +- `x_w`: The product term. + + +``` r +fit_iv_mod_std <- stdmod_lavaan(fit = fit, + x = "iv", + y = "med", + w = "mod", + x_w = "iv:mod") +fit_iv_mod_std +#> +#> Call: +#> stdmod_lavaan(fit = fit, x = "iv", y = "med", w = "mod", x_w = "iv:mod") +#> +#> Variable +#> Focal Variable iv +#> Moderator mod +#> Outcome Variable med +#> Product Term iv:mod +#> +#> lhs op rhs est se z pvalue ci.lower ci.upper +#> Original med ~ iv:mod 0.257 0.025 10.169 0 0.208 0.307 +#> Standardized med ~ iv:mod 0.440 NA NA NA NA NA +``` + +The standardized moderation effect of `mod` on the `iv`-`med` path is +0.440. + +# Form Bootstrap Confidence Interval + +`stdmod_lavaan()` can also be used to form nonparametric bootstrap +confidence interval for the standardized moderation effect. + +There are two approaches to do this. First, if bootstrap +confidence intervals was requested when fitting the model, +the stored bootstrap estimates will be used. This is +efficient because there is no need to do bootstrapping +again. + +We fit the model again, with bootstrapping: + + + + +``` r +fit <- sem(mod, test_mod1, fixed.x = FALSE, + se = "boot", + bootstrap = 2000, + iseed = 987543) +``` + +If bootstrapping has been done when fitting the model, +just adding `boot_ci = TRUE` is enough to request +nonparametric percentile bootstrap confidence interval: + + +``` r +fit_iv_mod_std_ci <- stdmod_lavaan(fit = fit, + x = "iv", + y = "med", + w = "mod", + x_w = "iv:mod", + boot_ci = TRUE) +fit_iv_mod_std_ci +#> +#> Call: +#> stdmod_lavaan(fit = fit, x = "iv", y = "med", w = "mod", x_w = "iv:mod", +#> boot_ci = TRUE) +#> +#> Variable +#> Focal Variable iv +#> Moderator mod +#> Outcome Variable med +#> Product Term iv:mod +#> +#> lhs op rhs est se z pvalue ci.lower ci.upper +#> Original med ~ iv:mod 0.257 0.035 7.298 0 0.184 0.322 +#> Standardized med ~ iv:mod 0.440 NA NA NA 0.322 0.539 +#> +#> Confidence interval of standardized moderation effect: +#> - Level of confidence: 95% +#> - Bootstrapping Method: Nonparametric +#> - Type: Percentile +#> - Number of bootstrap samples requests: +#> - Number of bootstrap samples with valid results: 2000 +#> +#> NOTE: Bootstrapping conducted by the method in 0.2.7.5 or later. To use +#> the method in the older versions for reproducing previous results, set +#> 'use_old_version' to 'TRUE'. +``` + +The 95% confidence interval of the standardized moderation effect is +0.322 to +0.539. + +The second approach, not covered here, uses +[`do_boot()`](https://sfcheung.github.io/manymome/articles/do_boot.html) +from +the [`manymome`](https://sfcheung.github.io/manymome/index.html) package. +to generate bootstrap estimates. To use the stored bootstrap +estimates, set `boot_out` to the output of `do_boot()`. +The stored bootstrap estimates will then be used. This method +can be used when non-bootstrapping confidence intervals are +needed when fitting the model. + +# Remarks + +The function `stdmod_lavaan()` can be used for more complicated path models. +The computation of the standardized moderation effect in a path model depends +only on the standard deviations of the three variables involved +(`x`, `w`, and `y`). + +# Reference(s) + +The computation of the standardized moderation effect is based on the simple +formula presented in the following manuscript, using the standard deviations of +the outcome variable, focal variable, and the moderator: + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + diff --git a/vignettes/stdmod_lavaan.Rmd.original b/vignettes/stdmod_lavaan.Rmd.original new file mode 100644 index 0000000..e0bf1a2 --- /dev/null +++ b/vignettes/stdmod_lavaan.Rmd.original @@ -0,0 +1,183 @@ +--- +title: "Standardized Moderation Effect in a Path Model by stdmod_lavaan()" +author: "Shu Fai Cheung and David Weng Ngai Vong" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Standardized Moderation Effect in a Path Model by stdmod_lavaan()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.width = 6, + fig.height = 4, + fig.align = "center", + fig.path = "" +) +``` + +# Purpose + +This document demonstrates how to use `stdmod_lavaan()` from +the package `stdmod` to compute the +standardized moderation effect in a path model fitted by `lavaan::sem()`. + +More about this package can be found +in `vignette("stdmod", package = "stdmod")` +or at [https://sfcheung.github.io/stdmod/](https://sfcheung.github.io/stdmod/). + +# Setup the Environment + +```{r setup} +library(stdmod) # For computing the standardized moderation effect conveniently +library(lavaan) # For doing path analysis in lavaan. +``` + +# Load the Dataset + +```{r load_dataset} +data(test_mod1) +round(head(test_mod1, 3), 3) +``` + +This test data set has 300 cases, six variables, all continuous. + + +# Fit the Model by `lavaan::sem()` + +The product term can be formed manually or by the colon operator, `:`. +`stdmod_lavaan()` will work in both cases. + +This is the model to be tested: + +```{r mod_sem} +mod <- +" +med ~ iv + mod + iv:mod + cov1 +dv ~ med + cov2 +" +fit <- sem(mod, test_mod1, fixed.x = FALSE) +summary(fit) +``` + +The results show that `mod` significantly moderates the effect of +`iv` on `med`. + +# Compute the Standardized Moderation Effect + +As in the case of regression, the coefficient of `iv:mod` in +the standardized solution is not the desired standardized coefficient because +it standardizes the product term. + +```{r} +standardizedSolution(fit)[3, ] +``` + +After fitting the path model by `lavaan::lavaan()`, we can use `stdmod_lavaan()` +to compute the standardized moderation effect using the standard deviations +of the focal variable, the moderator, and the outcome variable +[(Cheung, Cheung, Lau, Hui, & Vong, 2022)](https://doi.org/10.1037/hea0001188). + +The minimal arguments are: + +- `fit`: The output from `lavaan::lavaan()` and its wrappers, such + as `lavaan::sem()`. +- `x`: The focal variable, the variable with its effect on the + outcome variable being moderated. +- `y`: The outcome variable. +- `w`: The moderator. +- `x_w`: The product term. + +```{r} +fit_iv_mod_std <- stdmod_lavaan(fit = fit, + x = "iv", + y = "med", + w = "mod", + x_w = "iv:mod") +fit_iv_mod_std +``` + +The standardized moderation effect of `mod` on the `iv`-`med` path is +`r formatC(coef(fit_iv_mod_std), 3, format = "f")`. + +# Form Bootstrap Confidence Interval + +`stdmod_lavaan()` can also be used to form nonparametric bootstrap +confidence interval for the standardized moderation effect. + +There are two approaches to do this. First, if bootstrap +confidence intervals was requested when fitting the model, +the stored bootstrap estimates will be used. This is +efficient because there is no need to do bootstrapping +again. + +We fit the model again, with bootstrapping: + +```{r echo = FALSE, eval = FALSE} +if (file.exists("egl_lavaan_boot.rds")) { + fit <- readRDS("egl_lavaan_boot.rds") + } else { + fit <- sem(mod, test_mod1, fixed.x = FALSE, + se = "boot", + bootstrap = 2000, + iseed = 987543) + saveRDS(fit, "egl_lavaan_boot.rds") + } +``` + +```{r} +fit <- sem(mod, test_mod1, fixed.x = FALSE, + se = "boot", + bootstrap = 2000, + iseed = 987543) +``` + +If bootstrapping has been done when fitting the model, +just adding `boot_ci = TRUE` is enough to request +nonparametric percentile bootstrap confidence interval: + +```{r} +fit_iv_mod_std_ci <- stdmod_lavaan(fit = fit, + x = "iv", + y = "med", + w = "mod", + x_w = "iv:mod", + boot_ci = TRUE) +fit_iv_mod_std_ci +``` + +The 95% confidence interval of the standardized moderation effect is +`r formatC(confint(fit_iv_mod_std_ci)[1], 3, format = "f")` to +`r formatC(confint(fit_iv_mod_std_ci)[2], 3, format = "f")`. + +The second approach, not covered here, uses +[`do_boot()`](https://sfcheung.github.io/manymome/articles/do_boot.html) +from +the [`manymome`](https://sfcheung.github.io/manymome/index.html) package. +to generate bootstrap estimates. To use the stored bootstrap +estimates, set `boot_out` to the output of `do_boot()`. +The stored bootstrap estimates will then be used. This method +can be used when non-bootstrapping confidence intervals are +needed when fitting the model. + +# Remarks + +The function `stdmod_lavaan()` can be used for more complicated path models. +The computation of the standardized moderation effect in a path model depends +only on the standard deviations of the three variables involved +(`x`, `w`, and `y`). + +# Reference(s) + +The computation of the standardized moderation effect is based on the simple +formula presented in the following manuscript, using the standard deviations of +the outcome variable, focal variable, and the moderator: + +Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) +Improving an old way to measure moderation effect in standardized units. +*Health Psychology*, *41*(7), 502-505. https://doi.org/10.1037/hea0001188. + diff --git a/vignettes/stdmod_lm_std1_boot.rds b/vignettes/stdmod_lm_std1_boot.rds deleted file mode 100644 index e81a789..0000000 Binary files a/vignettes/stdmod_lm_std1_boot.rds and /dev/null differ diff --git a/vignettes/stdmod_lm_stdall_boot.rds b/vignettes/stdmod_lm_stdall_boot.rds deleted file mode 100644 index 4ae5a12..0000000 Binary files a/vignettes/stdmod_lm_stdall_boot.rds and /dev/null differ