Specification of multivariate hierarchical priors

Hello!

I’ve been using Stan for several years, but this is my first forum post. Thanks for the awesome tool.

I’m working with a fairly standard hierarchical linear regression problem. I have J groups, and am modeling each of those groups with its own intercept and variance. In addition, I am modeling the effect of group selection probability (in a standard PPS sample design) on each group mean. At the individual level there is the usual vector of P predictors.

To achieve this I came up with the following model, using a non-centered parameterization as suggested in this forum, the Stan User’s manual, and in @betanalpha’s case studies:

data {

  int <lower = 0> N;             // number of observations
  int <lower=0> J;                // number of groups
  int <lower=0> R;               // number of group predictors
  matrix[J, R] X_group;       // group predictor matrix
  int <lower=0> P;               // number of predictors
  matrix[N, P] X_pop;          // individual predictor matrix
  int <lower=0> jj[N];           // group index j for observation n
  real y[N];                          // individual measurement
}

parameters {

  real mu;                                       // global mean
  real<lower=0> alpha_sigma;      // shape prior for sigma
  real<lower=0> beta_sigma;        // scale prior for sigma

  real alpha_tilde[J];                      // group alpha multiplier
  real<lower=0> sigma[J];             // group population variance
  vector[R] kappa;                         // group coefficients (in this case only group selection probability)
  real<lower=0> tau;                     // between-group variance
  vector[P] beta;                            // individual predictor coefficients
}

transformed parameters {
  real alpha[J];                               // group means
  real mu_n[N];                              // individual means
  real sigma_n[N];                         // individual variance

  for (j in 1:J) {
    alpha[j] = mu + tau * alpha_tilde[j] + X_group[j] * kappa;
  }

  for (n in 1:N) {
    mu_n[n] = alpha[jj[n]] + X_pop[n] * beta;
    sigma_n[n] = sigma[jj[n]];
  }
}

model {

  target += normal_lpdf(mu | 0, 0.2);

  target += normal_lpdf(alpha_sigma | 0, 5);
  target += normal_lpdf(beta_sigma | 0, 1);
  target += inv_gamma_lpdf(sigma | alpha_sigma, beta_sigma);

  target += normal_lpdf(tau | 0, 1);
  target += normal_lpdf(alpha_tilde | 0, 1);

  target += normal_lpdf(kappa | 0, 0.1);
  target += normal_lpdf(beta | 0, 1);

  target += normal_lpdf(y | mu_n, sigma_n);

}

generated quantities {
  real y_new[N];
  vector[N] log_lik;
  for (n in 1:N) {
    y_new[n] = normal_rng(mu_n[n], sigma_n[n]);
    log_lik[n] = normal_lpdf(y[n] | mu_n[n], sigma_n[n]);
  }
}

My basic problem is that the inference for kappa (the effect of selection probability on group mean) seems to track closely whatever I provide as the prior. This makes sense to me as I don’t actually expect the group selection probability to have a great deal of impact, but would like to include it considering it was part of the sample design (possibly this is a naive attitude - I am open to correction). In order to deal with this so far I have just made the prior on kappa fairly small (and could make it smaller still).

While I know that’s not cheating, exactly, I think that a better idea is to attempt to regularize the posterior for kappa by using a multivariate prior to model the correlation between the alpha_tilde (group-level multipliers) and kappa, but am having a difficult time figuring out how to do so with the non-centered parameterization.

Is it sufficient to add a 1 column vector to X_group and add something like the following:

transformed parameters {
  real alpha[J];
  real mu_n[N];
  real sigma_n[N];
  vector[2] gamma[J];

  for (j in 1:J) {
    gamma[j, 1] =  tau * alpha_tilde[j];
    gamma[j, 2] = kappa;
    alpha[j] = X_group[j] * gamma;
  }

  for (n in 1:N) {
    mu_n[n] = mu + alpha[jj[n]] + X_pop[n] * beta;
    sigma_n[n] = sigma[jj[n]];
  }
}

model {
  gamma ~ multi_normal(mu_gamma, quad_form_diag(Omega, tau_gamma));
}

In reflection, though, it seems like this would essentially attempt to model the between-group variance twice, so maybe I should just abandon the decomposition of the group means if going to a multivariate parameterization? Or perhaps there is a better way to work this out? I am sure that I am just not fully understanding the section on multivariate reparameterization in the User’s Manual properly or how to apply it to my model, but possibly also the idea to regularize through a multivariate model on group effects is misguided entirely in this case.

Thanks for your help, and any additional feedback regarding possible model improvements are also appreciated.

So I think I figured out the model that I actually want here:


data {
  int <lower = 0> N;
  real y[N];
  int <lower=0> J;
  int <lower=0> R;
  matrix[J, R] U;
  int <lower=0> jj[N];
}

parameters {

  // -------------- //
  // global effects //
  // -------------- //

  real mu;
  real<lower=0> alpha_sigma;
  real<lower=0> beta_sigma;
  
  vector[R] mu_gamma;
  vector<lower = 0>[R] tau_gamma;
  cholesky_factor_corr[R] L_corr;
  
  // ------------- //
  // group effects //
  // ------------- //

  vector[R] gamma;
  vector[J] alpha;
  real<lower=0> sigma_alpha;
  real<lower=0> sigma[J];
  
  // ------------------ //
  // population effects //
  // ------------------ //

}

transformed parameters {
  real mu_n[N];
  real sigma_n[N];
  vector[J] u_gamma;

  u_gamma = U * gamma;

  for (n in 1:N) {
    mu_n[n] = mu + alpha[jj[n]];
    sigma_n[n] = sigma[jj[n]];
  }
}

model {

  // ------------- //
  // global priors //
  // ------------- //

  target += normal_lpdf(mu | 0, 0.2);
  target += normal_lpdf(alpha_sigma | 0, 5);
  target += normal_lpdf(beta_sigma | 0, 1);
  
  target += normal_lpdf(mu_gamma | 0, 1);
  target += exponential_lpdf(tau_gamma | 5);
  target += lkj_corr_cholesky_lpdf(L_corr | 2);
  
  target += multi_normal_cholesky_lpdf(gamma | mu_gamma, diag_pre_multiply(tau_gamma, L_corr));
  // ------------ //
  // group priors //
  // ------------ //
  
  target += inv_gamma_lpdf(sigma | alpha_sigma, beta_sigma);
  target += normal_lpdf(alpha | u_gamma, sigma_alpha);
  
  // ----------------- //
  // population priors //
  // ----------------- //

  // ---------- //
  // likelihood //
  // ---------- //
  
  target += normal_lpdf(y | mu_n, sigma_n);
  
}

generated quantities {
  real y_new[N];
  vector[N] log_lik;
  for (n in 1:N) {
    y_new[n] = normal_rng(mu_n[n], sigma_n[n]);
    log_lik[n] = normal_lpdf(y[n] | mu_n[n], sigma_n[n]);
  }
}