Trouble Implementing R2-D2 Prior in Rstan

Hi, so I am trying to fit the marginal R2-D2 prior in the multiple linear regression setting without random effects in Rstan. The prior is specified on page 10 equation 10 of this article:

The prior is:

{\beta}_j \mid \sigma^2, {\lambda}_j \sim DE{(\sigma({\lambda}_j/2)}^{(1/2)}), {\lambda}_j \mid \xi \sim Ga( a_{\pi},\xi), \xi \sim Ga(b,1)

In addition, there is an inverse-gamma prior on \sigma^2: \sigma^2 \sim invgamma(0.001,0.001). Moreover, we set b = 0.5, and a_{\pi} = 1/(p^{b/2}n^{b/2}logn) where p is the number of predictors, around 500, and n = 60 Is the number of data points.

This is my attempt so far at implementing the prior with Stan:

data {
  int<lower=1> n; // Number of data
  int<lower=1> p; // Number of covariates
  matrix[n,p] X;  // n-by-p design matrix
  real y[n];      // n-dimensional response vector
  int n_nonzero;  // number of nonzero beta 

parameters {
  vector[p] beta;
  vector<lower=0>[p] lambda;
  real<lower=0> ksi;
  real<lower=0> sigma;

transformed parameters {
  vector[n] theta ;
  theta = X * beta;

model {
  ksi ~ gamma(0.5,1);
  lambda ~ gamma( 1/(p^(0.025)*n^(0.025)*log(n)), ksi);
  sigma ~ inv_gamma(0.001,0.001);
  for (j in 2:p) {
    beta[j] ~ double_exponential(0, sqrt(sigma)*(lambda[j]*0.5)^(0.5));
  y ~ normal(theta, sigma);

There are two major issues I’m having with this code.

Number one: the mixing is very bad. I receive many warnings that the chain is not mixing properly. Large R-hat, Bulk Effective and Tail Effective Samples Size too low etc.

Number two: The code takes about 5 minutes to run with one chain, which is a lot longer than I was hoping for. Would be very nice if it was possible to make it run faster.

Appreciate any help.

1 Like

Maybe the code example in the appendix of [2208.07132] Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2-D2-M2 prior is better? The paper is about extension of R2-D2 prior, but the code should be usable also for R2-D2. I’m tagging @paul.buerkner as he could provide further insights

You may also want to take a look at the brms implemention of the R2D2 prior for inspriation. See, for example, make_stancode(y~x, data = data.frame(y=rnorm(10), x = rnorm(10)), prior = prior(R2D2())) in R.


Thanks for the advice. I tried the brms implementation (without changing any of the code) and I receive a lot of divergent samples, around 1000 each time when running 1 chain with 10 000 iterations. Do you have any suggestions for what to try to resolve the issue? I already tried increasing adapt_delta, but it did not help.

The data I used was the Cereal dataset which was used in the “Real data examples” section of the original R2-D2 article. When comparing to using the code for the marginal R2-D2 published by the authors at GitHub - yandorazhang/R2D2: Bayesian linear regression with R2-D2 shrinkage prior, I found the mean squared prediction error to be about twice as large with the brms code for this example.

1 Like

With regard to divergent transitions, 1000 indeed seems like a lot even for such shrinkage priors. This suggests there may be something weird your data too. What if you try out smaller subsets of variables at first?

I cannot comment on the different implementations right now because I didn’t have time to compare the code.