Running time for hierarchical model

I am involved in the project, so I’ll try to provide some details.

As Tiril described, we have for each participant 4 outcomes, each measured 3 times. The data are in a data frame my_data with the colums:

  • outcometype
  • time
  • outcome (sum score values)
  • exposure
  • adjustment (variables we want to correct for)
  • max (the maximum possible sum score for an outcome)

Because the outcomes are sum scores from rating scales, we use brms’ custom family function to implement a beta binomial family.

In the simple analysis, the brms model is like this:

my_prior = c(set_prior(“normal(0,2)”, class = “b”),
             set_prior(“normal(0,3)”, class = “Intercept”)
             set_prior(“normal(0,3)”, class = “Intercept”, dpar = “phi”))

my_data_o1t1 = my_data[my_data$outcometype == 1 & my_data$time == 1,]

my_stan_var = stanvar(as.integer(my_data_o1t1$max),
                      name = “trials”)
my_stan_ctrl = list(adapt_delta = 0.99,
                    max_treedepth = 15)

sm = brm(outcome ~ exposure + adjustment,
         prior =  my_prior,
         data = my_data_o1t1 ,
         family = beta_binomial2, # defined as in the brms vignette
         stan_funs = stan_funs,   # defined as in the brms vignette
         stanvars = my_stan_var,
         control = my_stan_ctrl )

This model allows us estimating the effect of the exposure on the outcomes in 12 separate analyses (outcometype1_time1, outcometype1_time2, …, outcometype4_time3), where we only change the outcomes, and the exposure and adjustment variables remain the same.

However, we would rather analyse all exposure-outcome associations together, because the outcomes are correlated. A simple way to do this is to implement a hierarchical model, in which the association of exposure and outcome varies across outcome-types and time. This model should do this:

my_stan_var = stanvar(as.integer(my_data$max),
                      name = “trials”)

hm = brm(bf(outcome ~ exposure + adjustment + (exposure | outcometype:time),
            phi ~ (1|outcometype:time)),
         prior =  my_prior,
         data = my_data,
         family = beta_binomial2,
         stan_funs = stan_funs,
         stanvars = my_stan_var,
         control = my_stan_ctrl )

This model is so slow for the complete data set (N > 20000), that we are not getting anywhere. This model converges with a subset of the data, but once we use the full data, sampling slows down much more than expected. That is, if the model works with 10% of the data, i would expect the full model to be around 10 times slower, but not 50 times slower (does this make sense?).

So we have 2 main questions at the moment:

  • Does the hierarchical model look OK.
  • Are there reasons to expect the relationship between amount of data and speed of sampling to be non-linear?

Thanks in advance!