Technical issues implementing powerscaling procedure on distribution of estimated differences

I am having trouble implementing the procedure for powerscaling research quantities of interest outlined in Chapter 14 of @avehtari’s notebook outlining a procedure for comparing models with different outcomes.

The model I would like to run seeks to estimate differences in days of heroin use in the previous 28 days (heroin28, integer: 0-28) based on days use of amphetamine use in the 28 days before people commence opioid agonist treatment (ats_baseline, integer: 0-28). I would like to estimate pairwise differences at each of five candidate levels of ats_baseline (0, 7, 14, 21, 28 days use) at each of five timepoints within the first year of treatment (0 years, 0.25, 0.5, 0.75, 1 years from start of treatment: variable yearsFromStart). Blocks of treatment (or ‘encounters’) are defined by the encID variable. To be included in the dataset there needed to be an observation at assessment (yearsFromStart = 0) and at least one other during the first year, though there can be more than two.

Here is the data

atsOTP.RData (17.3 KB)

And here is the model, a univariate betabinomial thin-plate regression splines to estimate trajectories over time at each of the candidate values of ats_baseline. You’ll need brms and priorsense packages to run the code below. And apologies, the model takes a while to run.

fit_betabinomial_heroin_s <- brm(formula = heroin28 | trials(set) ~ s(yearsFromStart, by = ats_baseline) + (1 | encID),
                               data = d,
                               family = beta_binomial(),
                               prior = prior(prior = normal(0.1,4),
                                             class = b),
                               save_pars = save_pars(all=TRUE),
                               seed = 1234,
                               refresh = 0,
                               chains = 4,
                               warmup = 1e3,
                               iter = 2e3)

# add criterion to the object so we can compare models using cross-validation later
fit_betabinomial_heroin_s <- add_criterion(fit_betabinomial_heroin_s,
                                           criterion="loo",
                                           save_psis = TRUE)

Now to implement the powerscaling procedure on the distribution of estimated differences between each of the candidate levels of ats_baseline at each of the candidate timepoints.

I have adapted the code in Chapter 14 as best I can to my spline model, but when I get to the bind_draws(log_prior_draws(fit_betabinomial_heroin_s)) line it throws an error

d %>%
  data_grid(yearsFromStart = seq(0,1,by=0.25),
            ats_baseline=c(0,7,14,21,28),
            encID,
            set) %>%
    add_epred_draws(object = fit_betabinomial_heroin_s, 
                    re_formula = NA, 
                    allow_new_levels = TRUE) %>%
      compare_levels(variable = .epred, 
                     by = ats_baseline) %>%
        pivot_wider(names_from = c(ats_baseline, yearsFromStart),
                    values_from = .epred,
                    names_glue = "{ats_baseline} ats days: {yearsFromStart} years from start") %>%
            select(!c(.chain, .iteration)) %>%
              left_join(y = as_draws_df(log_lik_draws(fit_betabinomial_heroin_s)), 
                        by=".draw") %>%
                as_draws_df() %>%
                  bind_draws(log_prior_draws(fit_betabinomial_heroin_s))

# error mesage
# Error: 'iteration_ids' of bound objects do not match.

Can anyone tell me how to fix this? I assume that the log_prior_draws() function needs some tinkering to get iteration ids. But I am not sure how to do that. Any other pointers welcome. Thank you in advance.

Change the code to be

 error

d %>%
  data_grid(yearsFromStart = seq(0,1,by=0.25),
            ats_baseline=c(0,7,14,21,28),
            encID,
            set) %>%
    add_epred_draws(object = fit_betabinomial_heroin_s, 
                    re_formula = NA, 
                    allow_new_levels = TRUE) %>%
      compare_levels(variable = .epred, 
                     by = ats_baseline) %>%
        pivot_wider(names_from = c(ats_baseline, yearsFromStart),
                    values_from = .epred,
                    names_glue = "{ats_baseline} ats days: {yearsFromStart} years from start") %>%
            select(!c(.chain, .iteration)) %>%
left_join(y = as_draws_df(log_lik_draws(fit_betabinomial_heroin_s)), 
          by=".draw") %>%
left_join(y = as_draws_df(log_prior_draws(fit_betabinomial_heroin_s)), 
          by=".draw") %>%
as_draws_df()

And when calling powerscale_ functions, remember to select appropriate columns as shown in the case study (or you get an error as some columns are factors)

Thank you so much @avehtari. The code now runs but, as is now a familiar occurrence, a new problem emerges: When I run the code, on even a single outcome, I get the error Error: cannot allocate vector of size 29.5.

The code I ran on the data and model I provided in the post is below

  d %>%
    data_grid(yearsFromStart = seq(0,1,by=0.25),
              ats_baseline=c(0,7,14,21,28),
              encID,
              set) %>%
    add_epred_draws(object = fit_betabinomial_heroin_s, 
                    re_formula = NA, 
                    allow_new_levels = TRUE) %>%
    compare_levels(variable = .epred, 
                   by = ats_baseline) %>%
    pivot_wider(names_from = c(ats_baseline, yearsFromStart),
                values_from = .epred,
                names_glue = "{ats_baseline} ats days: {yearsFromStart} years from start") %>%
    select(!c(.chain, .iteration)) %>%
    left_join(y = as_draws_df(log_lik_draws(fit_betabinomial_heroin_s)), 
              by=".draw") %>%
    left_join(y = as_draws_df(log_prior_draws(fit_betabinomial_heroin_s)), 
              by=".draw") %>%
    as_draws_df() %>%
    subset_draws(variable=c("7 - 0 ats days: 0 years from start",
                            "log_lik",
                            "lprior"))) 

I’ve tried setting

library(future)
# set global max size of 8 G
30000*1024^2 # 30 gigabytes = 30000 megabytes
options(future.globals.maxSize = 31457280000)

at the top but I gather 30GB is an upper limit?

Not sure what the issue is. Any advice appreciated

What if you don’t use future? library(future) indicates you are trying to run something in parallel, and if each parallel execution gets the copy of something big, you may run out of memory, but without parallelization it may work fine.

Interesting. Will try it. Thank you