 Specification of multivariate hierarchical priors

Hello!

I’ve been using Stan for several years, but this is my first forum post. Thanks for the awesome tool.

I’m working with a fairly standard hierarchical linear regression problem. I have J groups, and am modeling each of those groups with its own intercept and variance. In addition, I am modeling the effect of group selection probability (in a standard PPS sample design) on each group mean. At the individual level there is the usual vector of P predictors.

To achieve this I came up with the following model, using a non-centered parameterization as suggested in this forum, the Stan User’s manual, and in @betanalpha’s case studies:

``````data {

int <lower = 0> N;             // number of observations
int <lower=0> J;                // number of groups
int <lower=0> R;               // number of group predictors
matrix[J, R] X_group;       // group predictor matrix
int <lower=0> P;               // number of predictors
matrix[N, P] X_pop;          // individual predictor matrix
int <lower=0> jj[N];           // group index j for observation n
real y[N];                          // individual measurement
}

parameters {

real mu;                                       // global mean
real<lower=0> alpha_sigma;      // shape prior for sigma
real<lower=0> beta_sigma;        // scale prior for sigma

real alpha_tilde[J];                      // group alpha multiplier
real<lower=0> sigma[J];             // group population variance
vector[R] kappa;                         // group coefficients (in this case only group selection probability)
real<lower=0> tau;                     // between-group variance
vector[P] beta;                            // individual predictor coefficients
}

transformed parameters {
real alpha[J];                               // group means
real mu_n[N];                              // individual means
real sigma_n[N];                         // individual variance

for (j in 1:J) {
alpha[j] = mu + tau * alpha_tilde[j] + X_group[j] * kappa;
}

for (n in 1:N) {
mu_n[n] = alpha[jj[n]] + X_pop[n] * beta;
sigma_n[n] = sigma[jj[n]];
}
}

model {

target += normal_lpdf(mu | 0, 0.2);

target += normal_lpdf(alpha_sigma | 0, 5);
target += normal_lpdf(beta_sigma | 0, 1);
target += inv_gamma_lpdf(sigma | alpha_sigma, beta_sigma);

target += normal_lpdf(tau | 0, 1);
target += normal_lpdf(alpha_tilde | 0, 1);

target += normal_lpdf(kappa | 0, 0.1);
target += normal_lpdf(beta | 0, 1);

target += normal_lpdf(y | mu_n, sigma_n);

}

generated quantities {
real y_new[N];
vector[N] log_lik;
for (n in 1:N) {
y_new[n] = normal_rng(mu_n[n], sigma_n[n]);
log_lik[n] = normal_lpdf(y[n] | mu_n[n], sigma_n[n]);
}
}
``````

My basic problem is that the inference for kappa (the effect of selection probability on group mean) seems to track closely whatever I provide as the prior. This makes sense to me as I don’t actually expect the group selection probability to have a great deal of impact, but would like to include it considering it was part of the sample design (possibly this is a naive attitude - I am open to correction). In order to deal with this so far I have just made the prior on kappa fairly small (and could make it smaller still).

While I know that’s not cheating, exactly, I think that a better idea is to attempt to regularize the posterior for kappa by using a multivariate prior to model the correlation between the alpha_tilde (group-level multipliers) and kappa, but am having a difficult time figuring out how to do so with the non-centered parameterization.

Is it sufficient to add a 1 column vector to X_group and add something like the following:

``````transformed parameters {
real alpha[J];
real mu_n[N];
real sigma_n[N];
vector gamma[J];

for (j in 1:J) {
gamma[j, 1] =  tau * alpha_tilde[j];
gamma[j, 2] = kappa;
alpha[j] = X_group[j] * gamma;
}

for (n in 1:N) {
mu_n[n] = mu + alpha[jj[n]] + X_pop[n] * beta;
sigma_n[n] = sigma[jj[n]];
}
}

model {
}

``````

In reflection, though, it seems like this would essentially attempt to model the between-group variance twice, so maybe I should just abandon the decomposition of the group means if going to a multivariate parameterization? Or perhaps there is a better way to work this out? I am sure that I am just not fully understanding the section on multivariate reparameterization in the User’s Manual properly or how to apply it to my model, but possibly also the idea to regularize through a multivariate model on group effects is misguided entirely in this case.

Thanks for your help, and any additional feedback regarding possible model improvements are also appreciated.

So I think I figured out the model that I actually want here:

``````
data {
int <lower = 0> N;
real y[N];
int <lower=0> J;
int <lower=0> R;
matrix[J, R] U;
int <lower=0> jj[N];
}

parameters {

// -------------- //
// global effects //
// -------------- //

real mu;
real<lower=0> alpha_sigma;
real<lower=0> beta_sigma;

vector[R] mu_gamma;
vector<lower = 0>[R] tau_gamma;
cholesky_factor_corr[R] L_corr;

// ------------- //
// group effects //
// ------------- //

vector[R] gamma;
vector[J] alpha;
real<lower=0> sigma_alpha;
real<lower=0> sigma[J];

// ------------------ //
// population effects //
// ------------------ //

}

transformed parameters {
real mu_n[N];
real sigma_n[N];
vector[J] u_gamma;

u_gamma = U * gamma;

for (n in 1:N) {
mu_n[n] = mu + alpha[jj[n]];
sigma_n[n] = sigma[jj[n]];
}
}

model {

// ------------- //
// global priors //
// ------------- //

target += normal_lpdf(mu | 0, 0.2);
target += normal_lpdf(alpha_sigma | 0, 5);
target += normal_lpdf(beta_sigma | 0, 1);

target += normal_lpdf(mu_gamma | 0, 1);
target += exponential_lpdf(tau_gamma | 5);
target += lkj_corr_cholesky_lpdf(L_corr | 2);

target += multi_normal_cholesky_lpdf(gamma | mu_gamma, diag_pre_multiply(tau_gamma, L_corr));
// ------------ //
// group priors //
// ------------ //

target += inv_gamma_lpdf(sigma | alpha_sigma, beta_sigma);
target += normal_lpdf(alpha | u_gamma, sigma_alpha);

// ----------------- //
// population priors //
// ----------------- //

// ---------- //
// likelihood //
// ---------- //

target += normal_lpdf(y | mu_n, sigma_n);

}

generated quantities {
real y_new[N];
vector[N] log_lik;
for (n in 1:N) {
y_new[n] = normal_rng(mu_n[n], sigma_n[n]);
log_lik[n] = normal_lpdf(y[n] | mu_n[n], sigma_n[n]);
}
}
``````