Pseudo-variance using intercepts in shrinkage priors

mhollanders · July 17, 2025, 5:43am

Hi everyone,

I am once again confused about implementing the pseudo-variance to scale shrinkage priors. When using R2D2 priors, the proportion of explained variance R2 is converted to explained variance tau2 = R2 / (1 - R2) which, in Gaussian models, is multiplied by the residuals variance to get the scales of coefficients and/or random effects, e.g.:

data {
  int N, P;  // number observations and predictors
  matrix[N, P] X;  // predictors
  array[N] real y;  // observations
}
parameters {
  real alpha;  // intercept
  real<lower=0> sigma;  // residual SD
  real<lower=0, upper=1> R2;  // proportion explained variance
  simplex[P] phi;  // variance partitions
  vector[P] z;  // z-scores
}
transformed parameters {
  vector[P] scales = sqrt(R2 / (1 - R2) * square(sigma)),
            beta = scales .* z;
}
model {
  alpha ~ std_normal();
  sigma ~ exponential(1);
  z ~ std_normal();
  y ~ normal(alpha + X * B, sigma);
}

Piironen and Vehtari (2017) suggest using the pseudo-variance in non-Gaussian models, which they define in Table 1. For Poisson models where y \sim \mathrm{Poisson} (\mu), the pseudo-variance is \mu^{-1}. In practice we don’t know \mu so they suggest using the sample mean.

Alternatively, I thought we could use the intercept to get the pseudovariance. So the above model as a Poisson model could be:

data {
  int N, P;  // number observations and predictors
  matrix[N, P] X;  // predictors
  array[N] int<lower=0> y;  // observations
}
parameters {
  real alpha;  // intercept
  real<lower=0, upper=1> R2;  // proportion explained variance
  simplex[P] phi;  // variance partitions
  vector[P] z;  // z-scores
}
transformed parameters {
  vector[P] scales = sqrt(R2 / (1 - R2) * exp(-alpha)),
            beta = scales .* z;
}
model {
  alpha ~ std_normal();
  sigma ~ exponential(1);
  z ~ std_normal();
  y ~ poisson(alpha + X * B);
}

I am mostly looking for feedback on this approach. When the baseline rate \log \alpha gets really small, the pseudo-variance gets huge, and I’m not sure if that’s desired. The same thing happens for Bernoulli models, where the pseudo-variance is \mu^{-1} (1 - \mu)^{-1}. Alternatively, Yanchenko et al. 2025 suggest a different approach, where they use the sample means to do something with the Generalised Beta Prime distribution that I don’t really understand. I’ve been using R2D2 priors for a while now but this continues to be a stumbling block, so I’d love to sort this out.

thanks!

Matt

Topic		Replies	Views
R2D2M2 prior for non-GLMMs Modeling techniques	11	512	November 10, 2024
R2D2 prior and Gaussian Processes Modeling brms	65	1215	June 19, 2025
Differential shrinkage of identical quantities by the same prior. Why? Modeling brms	6	350	August 4, 2023
Best Prior for Varying Intercept in Logistic Regression modeling Penalty Shot Success Rate Modeling	6	743	February 15, 2019
R2 for Bayesian models v. Nakagawa delta method General techniques , performance , loo	3	813	April 23, 2025

Pseudo-variance using intercepts in shrinkage priors

Related topics