Running time for hierarchical model

Please also provide the following information in addition to your question:

  • Operating System: Windows 10
  • brms Version: 2.2.0
    I am using the brm function to fit my model and I have four different outcomes (count data with different max scores for each outcome), measured at three different time points. I first fit the model separately for all outcomes and time points, i.e. ran 12 different models, and there was no issues with this. However, when I am trying to run the model with all outcomes simultaneously as a hierarchical model and including the random effect parameter (exposure | outcome:time), it takes extremely long to fit, even when just running the crude model without confounders (for example, for the crude model, in the chain progress, I get the following message: “1000 transitions using 10 leapfrog steps per transition would take 8650 seconds”). I do have a very large dataset (n=66000 (*12 for the hierarchical model)), can this be a sole reason for why the model takes so long to compute? If not, do you have any suggestions on what I can try to resolve this? Below is an example on how I have specified the model. Many thanks in advance, Tiril

model1 = brm(outcome~ exposure + (exposure | outcometype:time),
phi ~ (1|outcometype:time)),
prior = c(set_prior(“normal(0,2)”, class = “b”),
set_prior(“normal(0,3)”, class = “Intercept”),
set_prior(“normal(0,3)”, class = “Intercept”, dpar = “phi”)),
family = beta_binomial2,
stan_funs = stan_funs,
stanvars = stanvar(as.integer(dr$max), name = “trials”),
control = list(adapt_delta = 0.99,
max_treedepth = 15),
chains = 4,
iter = 500)

Can you please show me the full code you are using? Otherwise it is very hard to see what may go wrong.

I am involved in the project, so I’ll try to provide some details.

As Tiril described, we have for each participant 4 outcomes, each measured 3 times. The data are in a data frame my_data with the colums:

  • outcometype
  • time
  • outcome (sum score values)
  • exposure
  • adjustment (variables we want to correct for)
  • max (the maximum possible sum score for an outcome)

Because the outcomes are sum scores from rating scales, we use brms’ custom family function to implement a beta binomial family.

In the simple analysis, the brms model is like this:

my_prior = c(set_prior(“normal(0,2)”, class = “b”),
             set_prior(“normal(0,3)”, class = “Intercept”)
             set_prior(“normal(0,3)”, class = “Intercept”, dpar = “phi”))

my_data_o1t1 = my_data[my_data$outcometype == 1 & my_data$time == 1,]

my_stan_var = stanvar(as.integer(my_data_o1t1$max),
                      name = “trials”)
my_stan_ctrl = list(adapt_delta = 0.99,
                    max_treedepth = 15)

sm = brm(outcome ~ exposure + adjustment,
         prior =  my_prior,
         data = my_data_o1t1 ,
         family = beta_binomial2, # defined as in the brms vignette
         stan_funs = stan_funs,   # defined as in the brms vignette
         stanvars = my_stan_var,
         control = my_stan_ctrl )

This model allows us estimating the effect of the exposure on the outcomes in 12 separate analyses (outcometype1_time1, outcometype1_time2, …, outcometype4_time3), where we only change the outcomes, and the exposure and adjustment variables remain the same.

However, we would rather analyse all exposure-outcome associations together, because the outcomes are correlated. A simple way to do this is to implement a hierarchical model, in which the association of exposure and outcome varies across outcome-types and time. This model should do this:

my_stan_var = stanvar(as.integer(my_data$max),
                      name = “trials”)

hm = brm(bf(outcome ~ exposure + adjustment + (exposure | outcometype:time),
            phi ~ (1|outcometype:time)),
         prior =  my_prior,
         data = my_data,
         family = beta_binomial2,
         stan_funs = stan_funs,
         stanvars = my_stan_var,
         control = my_stan_ctrl )

This model is so slow for the complete data set (N > 20000), that we are not getting anywhere. This model converges with a subset of the data, but once we use the full data, sampling slows down much more than expected. That is, if the model works with 10% of the data, i would expect the full model to be around 10 times slower, but not 50 times slower (does this make sense?).

So we have 2 main questions at the moment:

  • Does the hierarchical model look OK.
  • Are there reasons to expect the relationship between amount of data and speed of sampling to be non-linear?

Thanks in advance!

It looks ok to me, but I think the problem is that you are no longer having reasonable priors on phi. Essentially the default prior is likely too wide. Unfortunately, setting priors on phi is hard.

Thus, I would suggest trying a (logit-)normal-binomial model rather than a beta-binomial model in the hope it solves convergence problems. In a univariate case, you would specify

my_data$obs <- 1:nrow(my_data)
outcome ~ exposure + adjustment + (1|obs)

Translated to the multivariate one, you could try out

my_data$group <- paste0(my_data$outcometype, "_", my_data$time)
hm <- brm(
  exposure + adjustment + (exposure | group) + (1 | gr(obs, by = group)),
  family = binomial(),

Thanks for your prompt response Paul.

It’s worth trying the binomial model. My not entirely clean way to check the prior for phi was to estimate the over-dispersion of the outcome variable and to check that most of the mass of the prior for phi is around the overdispersion of the outcome variable. Still, we should try narrower priors for phi.
The reason I am not sure that phi is the problem, is that we have no problems fitting phi in the simple model or with the hierarchical model when using only a subset of the data.

I am not sure understand what you are doing with the new obs variable here

but maybe this comes from the ambiguity of my last post.
What I don’t understand here is that if one does so for only one outcome-type and one time-point, then we don’t need the + (1|obs) part, because every participant contributes only one data point.

Related, because I am not familiar with the gr function beyond just reading the brms documentation, I only assume that the part (1 | gr(obs, by = group)) of your proposed hierarchical produces individual-wise intercepts to account for repeated measurement. Is this correct?

You have two ways to introduce overdispersion. One is directly via an overdispersion parameter so that binomial becomes beta-binomial or poisson becomes negative-binomial. The other option is to model a residual variance term, which is what (1|obs) does.

So either go for beta-binomial with phi or for binomial with the (1|obs) term.

One question remains, what is your response variable exactly? I assume it is not only 0 or 1 (bernoulli) but multipe counts, right? If it was just 0 1, neither phi nor a residual variance would be reasonable to model.

1 Like

OK, I seen now how + (1|obs) introduced Overdispersion, though I am unsure if it captures the same generative model as the original model we tried. One obvious difference I see is, that the model you propose has many more parameters (as many parameters as observations, compared to only one overdispersion parameter).

Anyhow, regarding testing the implementation of model priors: If I want to do a prior predictive check for brms model: should I simply save the Stan code, comment out the parts where the log posterior is updated, run thr model, andbgenerate predictions with the posterior_predict function? Or is there a simpler way to do this?

Yes, it is multiple counts.

It is not the same generative model, but fulfiles a similar purpose. It has formally more parameters but when you look at the effective number of parameter, the difference is much smaller. The advantage of this is that the simpler model (no overdispersion) is presented by 0 (i.e. standard deviation of 0 of the obs term), while in the beta-binomial model, no overdispersions is represented by infinity (i.e. phi parameter of infinity). For the former, it is far easier to set reasonable priors on.

To perform prior predictive checks, just run your model with the option sample_prior = "only".

I used prior predictive checks to examine the effects of priors and think that the priors for the over-dispersion parameter in the original model we used were not optimal.

The key point is that brms uses an exponential transform to keep the over-dispersion parameter above zero, so that one quickly gets very large over-dispersion parameters in a hierarchical model, because “fixed” and random effects are combined in a multiplicative manner.

Given what we know about our particular outcome variables, over-dispersion values larger than 40 or 50 are unlikely. Priors like

my_priors = c(set_prior("normal(0,.5)",class = "sd", dpar = "phi"),
              set_prior("normal(2,.5)",class = "Intercept", dpar = "phi"))

model this insight, and we find that we can also fit a hierarchical beta-binomial model with random slopes and random over-dispersion intercepts with such priors.

These priors might at first sight appear narrow, but given that the fixed and random effects are effectively combined in a multiplicative manner, they are not really narrow.

This figure shows prior (top row) and posterior (bottom row) predictive distributions.

Red lines are oberved data, grey lines / areas are model generated data.

1 Like

That looks very nice! Thanks for posting your prior predictive analysis!

I really like the graphs any chance you could share the code for the priors and posterior distributions.

I am happy to share the R code, but I fear it is hard to understand given that I did not comment it.

Anyhow, here are the main steps

  1. Specify your brms model
  2. Generate two brms fit objects: One for prior predictions (prior_model = brm(your_model_specifications, sample_prior = "only") and one for posterior predictions (post_model = brm(your_model_specifications, sample_prior = "no")
  3. generate prior_predictions = posterior_predict(prior_model) and posterior predictions post_predictions = posterior_predict(post_model)
  4. use the function plot_spaghetti2 below to make the plots on the left in my figure above.
  5. extract parameters prior_pars = as.matrix(prior_model) and post_pars = as.matrix(post_model). Then use the function my_dens_plot below to get a density plot as shown on the right side of the figure above. (Note that today I would rather use a histogram instead of a density plot here.)

If desired, use par(mfrow = c(...)) or layout to combine different plots in a figure

plot_spaghetti2 = function(pp,y,trials,nplot = 250) {
  # pp is a matrix generated from brms' posterior predict
  # y is the observed data
  # trials is the number of trials for the beta-binomial model
  # and is only used to generate breaks for the histogram
  # adapt as needed if you have a different outcome variable
  breaks = (0:(trials+1))-.5
  hs = apply(pp,1, function(x) hist(x, breaks = breaks, plot = F)$count)
  maxy = max(apply(hs,1,quantile,p = .975))
  hs = hs[,!apply(hs > maxy,2,any)]
       type = "n",
       xlim = range(breaks),
       ylim = c(0,maxy),
       xlab = "", ylab = "",
       bty = "n")
  clr = adjustcolor("black",alpha = .05)
  for (k in 1:nplot) 
          col = clr)
  segments(breaks,y0 = 0, y1 = maxy, col = "white", lwd = 2)
  h = hist(y, breaks = breaks, plot = F)$count
        col = "darkred",
        lwd = 2)
my_dens_plot = function(samples,x)
# samples are the samples for one parameter 
# x is a value the samples shall be compared with
# for example if samples are the expected 
# proportion of successful trials  in a beta binomial 
# model then x could be the proportion in the observed data y

d = density(samples)
d$y = c(0,d$y,0)
    d$x = c(min(d$x),d$x,max(d$x))
         main = paste0(ifelse(p == 1, paste0(g,": "),""),
         xlab ="",
         col = "gray",
         yaxt = "n",
         ylab = "",
         bty = "n")
    polygon(d,col = "gray", border = "gray")
    abline(v = yp[p], col = "darkred", lwd = 2)