Can't split by column by value type

While splitting columns by a discrete subject-level variable is straightforward, splitting columns by a value type (e.g., raw value, absolute change, percent change) generates tables with incorrect summaries. Can we add a feature that allows this kind of column-splitting?

For reproducibility, we generate the following 2 dataframes:

```r
library(rtables)
library(tidyr)
library(dplyr)

data("ex_adsl")


df_wide <- ex_adsl |> 
  select(USUBJID, ARM, BMRKR1) |>
  mutate(Baseline = BMRKR1,
         'Year 1' = Baseline + rnorm(n(), 10, 2),
         'Year 2' = Baseline + rnorm(n(), 20, 2)) |>
  pivot_longer(cols = c(Baseline, 'Year 1', 'Year 2'), names_to = "AVISIT", values_to = "VALUE") |>
  mutate(AVISIT = as.factor(AVISIT)) |>
  group_by(USUBJID) |>
  mutate(CHG = VALUE - lag(VALUE),
         PCHG = ((VALUE - lag(VALUE))/lag(VALUE))*100)

df_long <- df |> 
  pivot_longer(cols = c(VALUE, CHG, PCHG), names_to = "VALTYPE", values_to = "VALUE" ) |> 
  mutate(VALTYPE = factor(VALTYPE, levels = c("VALUE", "CHG", "PCHG")))
```

We also define a summary function for `rtables::analyze()`:

```r
s_summary_num <- function(x) {
    in_rows(
      "Mean (SD)" = rcell(c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE)), format = "xx.xx (xx.xx)"),
      "Median" = rcell(median(x, na.rm = TRUE),  format = "xx.xx")
      )}

```

In the simple case, where we split rows and columns each by a factor variable (in this case, a variable indicating follow-up time for rows and strata for columns), the row summaries generated by `summarize_row_groups()` are correct, but the column summaries generated by `add_colcounts()` are incorrect, counting number of records instead of number of patients. 

```r
tbl_wide <-  basic_table() %>%
  split_cols_by("ARM") %>% 
  add_colcounts() %>%
  split_rows_by("AVISIT") %>% 
  summarize_row_groups() %>%
  analyze(vars = "VALUE", afun = s_summary_num) %>%
  build_table(df = df_wide)

tbl_wide
```

```
               A: Drug X      B: Placebo    C: Combination
                (N=402)        (N=402)         (N=396)    
——————————————————————————————————————————————————————————
Baseline      134 (33.3%)    134 (33.3%)     132 (33.3%)  
  Mean (SD)   5.97 (3.55)    5.70 (3.31)     5.62 (3.49)  
  Median          5.39           4.81            4.61     
Year 1        134 (33.3%)    134 (33.3%)     132 (33.3%)  
  Mean (SD)   15.79 (4.01)   15.52 (3.70)    15.68 (4.05) 
  Median         15.44          14.79           15.22     
Year 2        134 (33.3%)    134 (33.3%)     132 (33.3%)  
  Mean (SD)   25.90 (4.39)   25.94 (3.89)    25.52 (3.99) 
  Median         25.08          25.56           24.82     
```

However, with this simple approach, we are not able to split columns by value _type_. To split columns by value type, we must pivot the table still longer, such that each value type can be associated with a factor level, as in `df_long`. When we then try a similar table-building approach, the column count summaries generated by `add_colcounts()` are still incorrect, and now the row summaries generated by `summarize_row_groups()` are also incorrect (although the percentages are still correct because the column- and row-counts are scaled both by 3).

```r
tbl_long <-  basic_table() %>%
  split_cols_by("VALTYPE") %>% 
  add_colcounts() %>%
  split_rows_by("AVISIT") %>% 
  summarize_row_groups() %>%
  analyze(vars = "VALUE", afun = s_summary_num) %>%
  build_table(df = df_long)

tbl_long
```

```
                 VALUE           CHG             PCHG      
                (N=1200)       (N=1200)        (N=1200)    
———————————————————————————————————————————————————————————
Baseline      400 (33.3%)    400 (33.3%)      400 (33.3%)  
  Mean (SD)   5.76 (3.45)         NA              NA       
  Median          4.84            NA              NA       
Year 1        400 (33.3%)    400 (33.3%)      400 (33.3%)  
  Mean (SD)   15.74 (3.94)   9.98 (2.06)    267.11 (320.90)
  Median         15.20           9.99           196.79     
Year 2        400 (33.3%)    400 (33.3%)      400 (33.3%)  
  Mean (SD)   25.78 (4.07)   10.04 (2.98)    69.80 (32.88) 
  Median         25.14          10.20            63.54  
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't split by column by value type #1021

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't split by column by value type #1021

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions