Different dispersion hyperparameters for each unique level of a factor

Please forgive my ignorance of the proper hierarchical regression nomenclature and conventions. I am sure that what I want is quite easily achieved, but I don’t know the right terms under which to search. I want to fit a model of the following form:

y_i \mid n_i \sim \mathrm{Bin}(n_i, p_i) \\ \mathrm{logit}(p_i) = \alpha_{r_i} + \beta_{r_i,t_i} + \delta_i \\ \beta_{r,t} \sim \mathrm{N}(0, \sigma_r^2) \\ \delta_i \sim \mathrm{N}(0, \tau^2)

where i indexes individual units, r is a binary factor, and t is a 4-level categorical factor. My data include observations of (y_i, n_i) for all 2x4 = 8 possible combinations of the levels of r and t.

I have attempted to specify this model in brms using the formula:

y | trials(n) ~ r + (1 | gr(t, by = r)) + (1 | i)

but this results in an error message to the effect that “some levels of ‘t’ correspond to multiple levels of ‘r’.” Can someone explain to me the right way of specifying this model in the brms/lmr4 syntax?

Here is some example data, if this helps at all:

data <- data.frame(
  y = as.integer(c(37, 2, 18, 21, 49, 12, 9, 17, 45, 21, 29, 26, 24, 20, 38, 5)),
  n = as.integer(c(185, 173, 165, 194, 150, 194, 130, 141, 198, 188, 120, 111, 132, 160, 110, 165)),
  r = factor(c(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1)),
  t = factor(c(0, 0, 1, 1, 2, 2, 3, 3, 0, 0, 1, 1, 2, 2, 3, 3)),
  i = factor(0:15)
)

No prior for \alpha? It would presumably need to allow a non-zero intercept or you would have to add a global intercept.

Also, this model is going to be super problematic to fit because it’s three additive effects with nothing to identify them other than the priors. We discuss why this is a problem in the problematic posteriors chapter of the User’s Guide.

I’m afraid I don’t know brms, so can’t suggest how to code it there. It’s very easy to write this kind of model in Stan but much harder to fit given the non-identifiability of the likelihood.

My intention was for \alpha_r to capture the mean difference (main effect) between the two levels of r (i.e., averaging over the levels of t), while constraining the conditional prior of \beta_{r,t}\mid r to have zero mean. (The motivation being to allow some partial pooling of the \beta_{r,t} parameters toward 0.) The batch effect \delta_i is meant to capture the remaining variance within each (r, t) group (whereas \sigma_r is supposed to capture the variance between levels of t, for each level of r). I am unfamiliar with the details of how this kind of thing is usually done (and notated). Do you have a different suggestion for a better way to structure the model?

I somehow missed this follow-up before, so sorry for not responding sooner.

I’m afraid not—you often just have to deal with these non-identifiable effects. It helps to enforce a sum-to-zero constraint on each of them and then add a global intercept, or to pin one of the values of each. Then it’s not just the prior doing the identification.

I would add a weakly informative prior for all of the parameters based on the scale of results you expect.

1 Like