-
Notifications
You must be signed in to change notification settings - Fork 53
Description
While splitting columns by a discrete subject-level variable is straightforward, splitting columns by a value type (e.g., raw value, absolute change, percent change) generates tables with incorrect summaries. Can we add a feature that allows this kind of column-splitting?
For reproducibility, we generate the following 2 dataframes:
library(rtables)
library(tidyr)
library(dplyr)
data("ex_adsl")
df_wide <- ex_adsl |>
select(USUBJID, ARM, BMRKR1) |>
mutate(Baseline = BMRKR1,
'Year 1' = Baseline + rnorm(n(), 10, 2),
'Year 2' = Baseline + rnorm(n(), 20, 2)) |>
pivot_longer(cols = c(Baseline, 'Year 1', 'Year 2'), names_to = "AVISIT", values_to = "VALUE") |>
mutate(AVISIT = as.factor(AVISIT)) |>
group_by(USUBJID) |>
mutate(CHG = VALUE - lag(VALUE),
PCHG = ((VALUE - lag(VALUE))/lag(VALUE))*100)
df_long <- df |>
pivot_longer(cols = c(VALUE, CHG, PCHG), names_to = "VALTYPE", values_to = "VALUE" ) |>
mutate(VALTYPE = factor(VALTYPE, levels = c("VALUE", "CHG", "PCHG")))We also define a summary function for rtables::analyze():
s_summary_num <- function(x) {
in_rows(
"Mean (SD)" = rcell(c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE)), format = "xx.xx (xx.xx)"),
"Median" = rcell(median(x, na.rm = TRUE), format = "xx.xx")
)}
In the simple case, where we split rows and columns each by a factor variable (in this case, a variable indicating follow-up time for rows and strata for columns), the row summaries generated by summarize_row_groups() are correct, but the column summaries generated by add_colcounts() are incorrect, counting number of records instead of number of patients.
tbl_wide <- basic_table() %>%
split_cols_by("ARM") %>%
add_colcounts() %>%
split_rows_by("AVISIT") %>%
summarize_row_groups() %>%
analyze(vars = "VALUE", afun = s_summary_num) %>%
build_table(df = df_wide)
tbl_wide A: Drug X B: Placebo C: Combination
(N=402) (N=402) (N=396)
——————————————————————————————————————————————————————————
Baseline 134 (33.3%) 134 (33.3%) 132 (33.3%)
Mean (SD) 5.97 (3.55) 5.70 (3.31) 5.62 (3.49)
Median 5.39 4.81 4.61
Year 1 134 (33.3%) 134 (33.3%) 132 (33.3%)
Mean (SD) 15.79 (4.01) 15.52 (3.70) 15.68 (4.05)
Median 15.44 14.79 15.22
Year 2 134 (33.3%) 134 (33.3%) 132 (33.3%)
Mean (SD) 25.90 (4.39) 25.94 (3.89) 25.52 (3.99)
Median 25.08 25.56 24.82
However, with this simple approach, we are not able to split columns by value type. To split columns by value type, we must pivot the table still longer, such that each value type can be associated with a factor level, as in df_long. When we then try a similar table-building approach, the column count summaries generated by add_colcounts() are still incorrect, and now the row summaries generated by summarize_row_groups() are also incorrect (although the percentages are still correct because the column- and row-counts are scaled both by 3).
tbl_long <- basic_table() %>%
split_cols_by("VALTYPE") %>%
add_colcounts() %>%
split_rows_by("AVISIT") %>%
summarize_row_groups() %>%
analyze(vars = "VALUE", afun = s_summary_num) %>%
build_table(df = df_long)
tbl_long VALUE CHG PCHG
(N=1200) (N=1200) (N=1200)
———————————————————————————————————————————————————————————
Baseline 400 (33.3%) 400 (33.3%) 400 (33.3%)
Mean (SD) 5.76 (3.45) NA NA
Median 4.84 NA NA
Year 1 400 (33.3%) 400 (33.3%) 400 (33.3%)
Mean (SD) 15.74 (3.94) 9.98 (2.06) 267.11 (320.90)
Median 15.20 9.99 196.79
Year 2 400 (33.3%) 400 (33.3%) 400 (33.3%)
Mean (SD) 25.78 (4.07) 10.04 (2.98) 69.80 (32.88)
Median 25.14 10.20 63.54