Thank you for the explanation :)
No worries about the derivation part. I tried to understand what brms is doing for this :
prior2 <- c(
set_prior("R2D2(mean_R2 = 0.333, prec_R2 = 3,autoscale=FALSE,cons_D2=1)", class = "b"),
set_prior("normal(-0.5,1.2)", class = "Intercept"),
set_prior("gamma(2, 1)", class = "phi")
)
model2_prior <- brm(
formula = form,
data = df,
prior = prior2,
sample_prior = "only",
file = "model2_prior",
family = Beta()
)
stancode(model2_prior)
This was the output
// generated with brms 2.23.0
functions {
/* compute scale parameters of the R2D2 prior
* Args:
* phi: local weight parameters
* tau2: global scale parameter
* Returns:
* scale parameter vector of the R2D2 prior
*/
vector scales_R2D2(vector phi, real tau2) {
return sqrt(phi * tau2);
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
int<lower=1> Kscales; // number of local scale parameters
// data for the R2D2 prior
real<lower=0> R2D2_mean_R2; // mean of the R2 prior
real<lower=0> R2D2_prec_R2; // precision of the R2 prior
// concentration vector of the D2 prior
vector<lower=0>[Kscales] R2D2_cons_D2;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] zb; // unscaled coefficients
real Intercept; // temporary intercept for centered predictors
// parameters of the R2D2 prior
real<lower=0,upper=1> R2D2_R2;
simplex[Kscales] R2D2_phi;
real<lower=0> phi; // precision parameter
}
transformed parameters {
vector[Kc] b; // scaled coefficients
vector<lower=0>[Kc] sdb; // SDs of the coefficients
real R2D2_tau2; // global R2D2 scale parameter
vector<lower=0>[Kscales] scales; // local R2D2 scale parameters
// prior contributions to the log posterior
real lprior = 0;
// compute R2D2 scale parameters
R2D2_tau2 = R2D2_R2 / (1 - R2D2_R2);
scales = scales_R2D2(R2D2_phi, R2D2_tau2);
sdb = scales[(1):(Kc)];
b = zb .* sdb; // scale coefficients
lprior += normal_lpdf(Intercept | -0.5,1.2);
lprior += beta_lpdf(R2D2_R2 | R2D2_mean_R2 * R2D2_prec_R2, (1 - R2D2_mean_R2) * R2D2_prec_R2);
lprior += gamma_lpdf(phi | 2, 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept + Xc * b;
mu = inv_logit(mu);
target += beta_lpdf(Y | mu .* phi, (1 - mu) .* phi);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(zb);
target += dirichlet_lpdf(R2D2_phi | R2D2_cons_D2);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
It seems to me like it is considering the equation on the logistic scale as just a regular equation (Gaussian) with sigma =1 ( just a superficial understanding of the code, but is doing something different).
I thought that the Taylor transform and the other methods described by Yanchenko are applied because in beta regression and other models, variance is not independent of the mean, and the relationship is nonlinear. I am confused about how these two ways of estimating beta priors fit together. Clearly, I am not understanding something fundamental, and I could also have misunderstood the stan code. I apologize if it is too basic, and also going in a different direction than my initial question. Any insight would be greatly appreciated.
EDIT by Aki: added code block ticks for Stan code