R2D2M2 prior for non-GLMMs

Hi,

I’ve been intrigued by the R2D2M2 paper and am interested in applying it for some ecological models. I’m currently working a dynamic occupancy model with sites nested in regions, and each site has random region-level and random site-level contributions to the parameters of the dynamic occupancy model, namely the colonisation, emigration, and detection rates.

I’m sure the answer is it depends but would I be able to specify the R2D2M2 prior for this model on all of the region- and site-level random effects, or would it be more complicated for a model like this?

Thanks,

Matt

Update: I think I was able to apply the R2D2 formula for region-level parameters of this hidden Markov model parameterised with an overall occupancy probability (psi), colonisation and emigration rates (gamma and epsilon), and call rates (mu, for the detection model). The region-level contributions are drawn from a multivariate normal. Below, in the transformed parameters, I first show the normal non-centered parameterisation of multivariate random effects. Then I show what I believe is the correct formula for R2D2.

parameters {
  real<lower=0, upper=1> psi_a;
  real<lower=0> gamma_a, epsilon_a, mu_a;
  vector<lower=0>[4] theta_t;
  cholesky_factor_corr[4] theta_O_L;
  matrix[4, n_region] theta_z;
  real<lower=0, upper=1> r2d2_r2;
  simplex[4 * n_region] r2d2_phi;
}

transformed parameters {
real r2d2_tau2 = r2d2_r2 / (1 - r2d2_r2);
if (!r2d2) {
    theta = (diag_pre_multiply(theta_t, theta_O_L) * theta_z)';
    } else {
      theta = (diag_pre_multiply(theta_t, theta_O_L),
               / theta_z
               .* sqrt(r2d2_tau2 * to_matrix(r2d2_phi, 4, n_region)))'; 
    }
    psi_r = inv_logit(logit(psi_a) + theta[:, 1]);
    gamma_r = exp(log(gamma_a) + theta[:, 2]);
    epsilon_r = exp(log(epsilon_a) + theta[:, 3]);
    mu_r = exp(log(mu_a) + theta[:, 4]);
  }
}

I suppose my question is, does that make any sense when there’s no residual variance? Looking at the formula it seems like you’re somehow scaling the total variance by the parameter-level contributions to that variance so maybe you don’t need the residual? Anyway, I think I’m starting to understand it better but could probably help with some guidance.

Thanks!

For Gaussian family, R2D2(M2) scales with sigma. In case of non-Gaussian exponential family, the regularized horseshoe paper Sparsity information and regularization in the horseshoe and other shrinkage priors recommends scaling with pseudovariance, and the same approach would make sense for R2D2(M2), too. It is possible that for R2-type priors there are other ways, and we’re investigating this, but at this point the pseudovariance approach is what I would use.

OK, thank you! Just checking, instead of dividing by the SDs of the covariates in the typical R2D2(M2) approach, if the “effects” are non-centered random effects, would the estimated SDs of the random effects take the place of the SDs of the cvoariates?

Oops, sorry, I read your question wrong. I was replying to prior for non-Gaussian model and just for the change due to different data model. Maybe R2D2M2 authors @javier.aguilar and @paul.buerkner can comment applicability for your hierarchical model.

Hello Matt

I am glad that you have found our work interesting. We are currently trying to develop an extension of this prior to other families and models.

Looking at the formula it seems like you’re somehow scaling the total variance by the parameter-level contributions to that variance so maybe you don’t need the residual?

The total variance is the sum of the individual variances that correspond to the overall and varying coefficients. The paper proposes to scale the variance of each coefficient by the residual variance so that we can factor it and have a one to one relationship between R2 and the total variance.

I am not familiar with the specifics of your model but if it allows a decomposition of the linear predictor in the same way we present in the paper then I would not see an issue on applying the prior.

The R2D2M2 prior is available within brms so you could also try to run your model with it. See here.

Just checking, instead of dividing by the SDs of the covariates in the typical R2D2(M2) approach, if the “effects” are non-centered random effects, would the estimated SDs of the random effects take the place of the SDs of the cvoariates?

If I understand your question correctly, you are wondering if it is necessary to scale by the standard deviation of your covariates as we suggest in the paper? I don’t believe you could scale by the SDs of the random effects since the R2D2(M2) prior places a prior on them and is trying to perform inference on them. In the worst case you could just ignore the scaling. This could change the performance of the prior though.

If you would like to have a further discussion via a video call or email, feel free to send me a message. I would be interested in making it work.

2 Likes

Hi Javier,

Thanks for getting back to me.

I think I had a few fundamental misunderstandings about this prior and paper. I think because of the M2 part I thought it referred to hierarchical random effects, which I’m not sure about now. So I thought instead of by dividing by the SDs of observed covariates, you could divide by the SDs of the estimated random effects. I will definitely keep R2D2 in mind for models with many coefficients, but I have moved on to a different prior for this project.

Thanks!