Hierarchical priors in brms

Please also provide the following information in addition to your question:

  • Operating System: Win10
  • brms Version: 2.9.0

I realized recently when doing an analysis using the nonlinear functionality of brms that you can overparameterize your model and let the HMC do the numerical integration. Suppose I have a parameter psi = f(gamma1,gamma2) where f is known and gamma1 and gamma2 are not uniquely identifiable. Let’s also suppose I have a rule for selecting priors of gamma1 and gamma2 but not for psi, and the implied prior on psi by our choice of priors for gamma1 and gamma2 is not closed form. Thus, I should be able use the HMC to perform the relevant integration, treating gamma1 and gamma2 as latent variables.

My concern is because brm is a wrapper it may not appropriately adjust the values of gamma1 and gamma2 to prevent computational infinities. Also I don’t know enough in general about the details of the HMC algorithm to know if this will cause issues for it in general since I understand it is an adaptive version of the algorithm. Here’s a trivial example where everything appears to work fine:

x <- rnorm(100)
y <- x + rnorm(100,0,.1)
df <- data.frame(y,x)

library(brms)
prior1 <- prior(normal(0, 2.5), nlpar = “b1”) +
prior(gamma(1, 1), nlpar = “gamma1”) +
prior(gamma(1, 1), nlpar = “gamma2”)

fit1 <- brm(bf(y ~ log(gamma1/gamma2) + x*b1, gamma1+gamma2+b1 ~ 1, nl = TRUE),
data = df, prior = prior1,
inits = “random”, chains = 1, iter = 1000, warmup = 500,thin = 1,
cores=1)

Any thoughts? Should it be OK to do this? Any sense on how it affects computing time?

OK, I feel kind of stupid in that I didn’t follow through appropriately on my toy example. There were a lot of warnings and I needed to do the following to make it perform better:

fit1 <- brm(bf(y ~ log(gamma1/gamma2) + x*b1, gamma1+gamma2+b1 ~ 1, nl = TRUE),
data = df, prior = prior1,
inits = “random”, chains = 1, iter = 1000, warmup = 500,thin = 1,
cores=1,
control=list(adapt_delta=.999999,max_treedepth=15))

Which is not ideal. I suppose this answers my question. I had to super slow the HMC down to get rid of most of the warnings. I’m still interested in input on what people feel about this beyond what I’ve found.

I think you need to make sure that gamma1 and gamma2 to always evaluate to positive values. Otherwise, the log transform of gamma1/gamma2 will kill your model basically. In your example, add lb = 0 to the prior(gamma(1, 1), nlpar = “gamma1”) that is prior(gamma(1, 1), nlpar = “gamma1”, lb = 0)

Thanks. I honestly thought by specifying the prior as gamma it would automatically know the support was positive. So I did this and set the control options back to default and everything worked great in the toy example I provided. So to get back to my original question, is it appropriate to use the HMC algorithm to do numerical integration in an overparameterized model?

I wouldn’t call it numerical integration here. Your way of thinking may come from a frequentist stand point where we marginalize (integrate) out latent variables? Is this what you mean?

In a fully Bayesian framework, we simply treat latent variables are parameters and estimate them along wiht all other parameters.

No, my way of thinking isn’t coming from a frequentist perspective in this case. Maybe I’m not saying it the right way.

The classic example (sorry I didn’t think of it sooner): the treatment means model. \mu_0 + \mu_t, for t = 1,2,…, T. We cannot uniquely identify \mu_0, \mu_1,…, \mu_T because it is overparameterized. So we create some restriction like \sum_{t=1}^T \mu_t = 0. In Bayesian framework, we don’t have to do this upfront. We can overparameterize the model and then compute the estimated quanities \mu_t - mean(\mu_{1:T}) and \mu_0 + mean(\mu_{1:T}) after the fact and those mix very quickly whereas \mu_{0:T} might mix very slowly or even diverge.

By using the nonlinear modeling option in brm you can create scenarios where you could need estimated quantities to identify parameters in overparameterized models, whereas if you used a standard regression model it would already be coded to prevent this from happening. So what I want to know is how much of a problem it is for brm if I overparameterize the model and then compute estimated quantities after the fact.

I see, thanks for your explanation. My concern would only be that for poorly identified models, sampling will be super slow, but this is were priors come into play. If the sampling algorithm works well, then compute the quantities post-hoc should be no problem.