Hi everyone,
I am once again confused about implementing the pseudo-variance to scale shrinkage priors. When using R2D2 priors, the proportion of explained variance R2 is converted to explained variance tau2 = R2 / (1 - R2)
which, in Gaussian models, is multiplied by the residuals variance to get the scales of coefficients and/or random effects, e.g.:
data {
int N, P; // number observations and predictors
matrix[N, P] X; // predictors
array[N] real y; // observations
}
parameters {
real alpha; // intercept
real<lower=0> sigma; // residual SD
real<lower=0, upper=1> R2; // proportion explained variance
simplex[P] phi; // variance partitions
vector[P] z; // z-scores
}
transformed parameters {
vector[P] scales = sqrt(R2 / (1 - R2) * square(sigma)),
beta = scales .* z;
}
model {
alpha ~ std_normal();
sigma ~ exponential(1);
z ~ std_normal();
y ~ normal(alpha + X * B, sigma);
}
Piironen and Vehtari (2017) suggest using the pseudo-variance in non-Gaussian models, which they define in Table 1. For Poisson models where y \sim \mathrm{Poisson} (\mu), the pseudo-variance is \mu^{-1}. In practice we don’t know \mu so they suggest using the sample mean.
Alternatively, I thought we could use the intercept to get the pseudovariance. So the above model as a Poisson model could be:
data {
int N, P; // number observations and predictors
matrix[N, P] X; // predictors
array[N] int<lower=0> y; // observations
}
parameters {
real alpha; // intercept
real<lower=0, upper=1> R2; // proportion explained variance
simplex[P] phi; // variance partitions
vector[P] z; // z-scores
}
transformed parameters {
vector[P] scales = sqrt(R2 / (1 - R2) * exp(-alpha)),
beta = scales .* z;
}
model {
alpha ~ std_normal();
sigma ~ exponential(1);
z ~ std_normal();
y ~ poisson(alpha + X * B);
}
I am mostly looking for feedback on this approach. When the baseline rate \log \alpha gets really small, the pseudo-variance gets huge, and I’m not sure if that’s desired. The same thing happens for Bernoulli models, where the pseudo-variance is \mu^{-1} (1 - \mu)^{-1}. Alternatively, Yanchenko et al. 2025 suggest a different approach, where they use the sample means to do something with the Generalised Beta Prime distribution that I don’t really understand. I’ve been using R2D2 priors for a while now but this continues to be a stumbling block, so I’d love to sort this out.
thanks!
Matt