In my move toward bayesian models, I began with the fantastic brms
package. I would like to better understand and control what I’m doing, so I have started trying to work in “pure” stan. To try to learn/understand, I have been generating stan code with brms and then trying to tweak things from there.
With a recent hierarchical bernoulli model, I’ve hit some roadblocks and outlined a couple of questions in this thread.
Before trying to approach what I think is combining parameters with indexes, it seems I need a better understanding of how the stan model is coded.
The model is:
data {
int<lower=1> N; // number of observations
int Y[N]; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
vector[N_2] z_2[M_2]; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
r_1_1 = (sd_1[1] * (z_1[1]));
r_2_1 = (sd_2[1] * (z_2[1]));
}
model {
// initialize linear predictor term
vector[N] mu = Intercept + Xc * b;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
}
// priors including all constants
target += normal_lpdf(b | 0,1);
target += normal_lpdf(Intercept | 0,1);
target += cauchy_lpdf(sd_1[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += std_normal_lpdf(z_1[1]);
target += cauchy_lpdf(sd_2[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += std_normal_lpdf(z_2[1]);
// likelihood including all constants
if (!prior_only) {
target += bernoulli_logit_lpmf(Y | mu);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// additionally draw samples from priors
real prior_b = normal_rng(0,1);
real prior_Intercept = normal_rng(0,1);
real prior_sd_1_1 = cauchy_rng(0,1);
real prior_sd_2_1 = cauchy_rng(0,1);
// use rejection sampling for truncated priors
while (prior_sd_1_1 < 0) {
prior_sd_1_1 = cauchy_rng(0,1);
}
while (prior_sd_2_1 < 0) {
prior_sd_2_1 = cauchy_rng(0,1);
}
}
I’m confused by the parameters block and I imagine it’s a basic concept I’m lacking.
sd_1, sd_2, z_1, and z_2 are defined, but I don’t see where they are calculated. Is sd_
a shortcut/shorthand for calculating standard deviation? Is z_
a shortcut for scaling and centering?