Skip to content

Can't split by column by value type #1021

@danleibovitz

Description

@danleibovitz

While splitting columns by a discrete subject-level variable is straightforward, splitting columns by a value type (e.g., raw value, absolute change, percent change) generates tables with incorrect summaries. Can we add a feature that allows this kind of column-splitting?

For reproducibility, we generate the following 2 dataframes:

library(rtables)
library(tidyr)
library(dplyr)

data("ex_adsl")


df_wide <- ex_adsl |> 
  select(USUBJID, ARM, BMRKR1) |>
  mutate(Baseline = BMRKR1,
         'Year 1' = Baseline + rnorm(n(), 10, 2),
         'Year 2' = Baseline + rnorm(n(), 20, 2)) |>
  pivot_longer(cols = c(Baseline, 'Year 1', 'Year 2'), names_to = "AVISIT", values_to = "VALUE") |>
  mutate(AVISIT = as.factor(AVISIT)) |>
  group_by(USUBJID) |>
  mutate(CHG = VALUE - lag(VALUE),
         PCHG = ((VALUE - lag(VALUE))/lag(VALUE))*100)

df_long <- df |> 
  pivot_longer(cols = c(VALUE, CHG, PCHG), names_to = "VALTYPE", values_to = "VALUE" ) |> 
  mutate(VALTYPE = factor(VALTYPE, levels = c("VALUE", "CHG", "PCHG")))

We also define a summary function for rtables::analyze():

s_summary_num <- function(x) {
    in_rows(
      "Mean (SD)" = rcell(c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE)), format = "xx.xx (xx.xx)"),
      "Median" = rcell(median(x, na.rm = TRUE),  format = "xx.xx")
      )}

In the simple case, where we split rows and columns each by a factor variable (in this case, a variable indicating follow-up time for rows and strata for columns), the row summaries generated by summarize_row_groups() are correct, but the column summaries generated by add_colcounts() are incorrect, counting number of records instead of number of patients.

tbl_wide <-  basic_table() %>%
  split_cols_by("ARM") %>% 
  add_colcounts() %>%
  split_rows_by("AVISIT") %>% 
  summarize_row_groups() %>%
  analyze(vars = "VALUE", afun = s_summary_num) %>%
  build_table(df = df_wide)

tbl_wide
               A: Drug X      B: Placebo    C: Combination
                (N=402)        (N=402)         (N=396)    
——————————————————————————————————————————————————————————
Baseline      134 (33.3%)    134 (33.3%)     132 (33.3%)  
  Mean (SD)   5.97 (3.55)    5.70 (3.31)     5.62 (3.49)  
  Median          5.39           4.81            4.61     
Year 1        134 (33.3%)    134 (33.3%)     132 (33.3%)  
  Mean (SD)   15.79 (4.01)   15.52 (3.70)    15.68 (4.05) 
  Median         15.44          14.79           15.22     
Year 2        134 (33.3%)    134 (33.3%)     132 (33.3%)  
  Mean (SD)   25.90 (4.39)   25.94 (3.89)    25.52 (3.99) 
  Median         25.08          25.56           24.82     

However, with this simple approach, we are not able to split columns by value type. To split columns by value type, we must pivot the table still longer, such that each value type can be associated with a factor level, as in df_long. When we then try a similar table-building approach, the column count summaries generated by add_colcounts() are still incorrect, and now the row summaries generated by summarize_row_groups() are also incorrect (although the percentages are still correct because the column- and row-counts are scaled both by 3).

tbl_long <-  basic_table() %>%
  split_cols_by("VALTYPE") %>% 
  add_colcounts() %>%
  split_rows_by("AVISIT") %>% 
  summarize_row_groups() %>%
  analyze(vars = "VALUE", afun = s_summary_num) %>%
  build_table(df = df_long)

tbl_long
                 VALUE           CHG             PCHG      
                (N=1200)       (N=1200)        (N=1200)    
———————————————————————————————————————————————————————————
Baseline      400 (33.3%)    400 (33.3%)      400 (33.3%)  
  Mean (SD)   5.76 (3.45)         NA              NA       
  Median          4.84            NA              NA       
Year 1        400 (33.3%)    400 (33.3%)      400 (33.3%)  
  Mean (SD)   15.74 (3.94)   9.98 (2.06)    267.11 (320.90)
  Median         15.20           9.99           196.79     
Year 2        400 (33.3%)    400 (33.3%)      400 (33.3%)  
  Mean (SD)   25.78 (4.07)   10.04 (2.98)    69.80 (32.88) 
  Median         25.14          10.20            63.54  

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions