I have a question regarding the following linear mixed effects model:
y_{ij} = (\beta_0 + b_{0i}) + (\beta_1 + b_{1i})t_{ij} + \varepsilon_{ij}.
I can fit the model just find using the brms package and compute the Bayes factor for, say, the hypothesis that b_{1i} = 0 vs. b_{1i} > 0. For example, using the PQRI data from the ICH guideline for testing poolability of batches, I would have
# set priors
priors <- c(
prior(cauchy(0, 3), class = sd), # cauchy dist. centered at 0 = half-cauchy dist.
prior(normal(0, 5), class = b, coef = Month),
prior(cauchy(0, 3), class = sigma),
prior(normal(100, 20), class = Intercept)
)
fit_brm <- brms::brm(
Assay ~ Month + (1 + Month || Batch), # assuming no correlation
data = d,
prior = priors,
sample_prior = "yes", # for BF computation
iter = 1e4,
cores = future::availableCores(),
thin = 2
)
I know about the function stancode(fit_brm)
that gives the corresponding Stan code used for the model fit as well below:
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_1] r_1_2; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
r_1_2 = (sd_1[2] * (z_1[2]));
lprior += normal_lpdf(b[1] | 0, 5);
lprior += normal_lpdf(Intercept | 100, 20);
lprior += cauchy_lpdf(sigma | 0, 3)
- 1 * cauchy_lccdf(0 | 0, 3);
lprior += cauchy_lpdf(sd_1 | 0, 3)
- 2 * cauchy_lccdf(0 | 0, 3);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_1[2]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// additionally sample draws from priors
real prior_b__1 = normal_rng(0,5);
real prior_Intercept = normal_rng(100,20);
real prior_sigma = cauchy_rng(0,3);
real prior_sd_1 = cauchy_rng(0,3);
// use rejection sampling for truncated priors
while (prior_sigma < 0) {
prior_sigma = cauchy_rng(0,3);
}
while (prior_sd_1 < 0) {
prior_sd_1 = cauchy_rng(0,3);
}
}
-
What I am interested in specifically is the computation of the Bayes factor. I understand using the hypothesis function will give the Evid.Ratio for a specific hypothesis of interest. How would I specify the arguments in the
hypothesis()
function to test for b_{1i} = 0 vs. b_{1i} > 0 in this case? Or should I use thebf()
specification? -
If I want to have some posterior predictive values from this fit, how would I modify this code/Stan code?
-
Another goal I have is to generate many data sets with some having b_{1i} = 0 and some having b_{1i} > 0 which mimic the PQRI data to assess the distribution of the Bayes factor. This is very inefficient to do if I have to call upon the
brm()
andhypothesis()
function many times as it takes a few minutes to do a single fit to the actual PQRI data. Is there a way to code this in Stan manually to get at this Bayes factor/Evid.Ratio computation or the loglikelihoods and then usebridge_sampler
? I am not familiar enough withbrms
and Stan to pick out just the part of the Stan code generated above that I need.
Any help or advice is appreciated!
Randy
PQRI_example.csv (786 Bytes)
Edit: included the csv data file