Multivariate priors for hierarchical models - Group-level predictors for prior scale

saschagobel · November 5, 2023, 5:17pm

Section 1.13 in the Stan User’s Guide states (https://mc-stan.org/docs/stan-users-guide/multivariate-hierarchical-priors.html):

“As for the prior means, if there is information about the scale of variation of coefficients across groups, it should be incorporated into the prior for tau. For large numbers of exchangeable coefficients, the components of tau itself (perhaps excluding the intercept) may themselves be given a hierarchical prior.”

How would the Stan code in this example look, if extended to inlcude group-level predictors for the prior scale as well? Suppose we use the same transposed preditor matrix matrix[L, J] u that is also used for the prior mean.

Bob_Carpenter · November 6, 2023, 5:43pm

The easiest thing to do is model the log scales hierarchically in exactly the same way. Then just transform to the scales using exp(). This itself is a form of seemingly unrelated regression, so you could also model covariance among the log scales.

So that’d be something like a vector \log(\tau) of scales, with a prior

\log(\tau) \sim \textrm{multiNormalCholesky}(...\textrm{some regression}..., L_{\Sigma^\tau}).

\tau = \exp(\log \tau).

And it’s elephants all the way down in that you now need to specify a covariance matrix \Sigma^\tau as a prior (it may just be diagonal, in which case you’d just use \textrm{normal} rather than \textrm{MultiNormalCholesky}).

saschagobel · November 6, 2023, 7:19pm

Thanks, that’s helpful. However, using the transposed predictor matrix matrix[L, J] to model “the scale of variation of coefficients across groups” means that the resulting \tau_k is not a vector, but a matrix or an array of vectors, as it varies over j, right? How does that work with diag_pre_multiply?

Bob_Carpenter · November 6, 2023, 9:21pm

You can sometimes run into problems where you need a container to be in two different shapes for different operations and the only thing to do is to manually transpose, etc. But here, we have

(\textrm{diag}(u) \cdot v)^{\top} = v^{\top} \cdot \textrm{diag}(u)^\top = v^\top \cdot \textrm{diag}(u),

so we just diag_post_multiply instead.

Sorry if I’m missing something fundamental here. I still count on my fingers for matrices.

saschagobel · November 9, 2023, 1:48pm

So, this runs, but plenty of divergences and almost all transitions hit maximum treedepth:

data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  int<lower=1> L;              // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;              // individual predictors
  matrix[L, J] u;              // group predictors
  vector[N] y;                 // outcomes
}

parameters {
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
  cholesky_factor_corr[K] L_Omega_tau;
  vector[K] log_tau;                         // prior scale logged
  matrix[K, L] gamma;                        // group coeffs
  matrix[K, L] zeta;                         // 
  real<lower=0> sigma;                       // prediction error scale
  vector<lower=0, upper=pi() / 2>[K] L_log_tau_unif;
}

transformed parameters {
  vector<lower=0>[K] L_log_tau = 2.5 * tan(L_log_tau_unif);
  vector<lower=0>[K] tau = exp(log_tau);
  matrix[K, J] beta = gamma * u + diag_pre_multiply(tau, L_Omega) * z;
}

model {
  vector[N] mu;
  array[J] vector[K] mu_tau;
  matrix[K, K] L_Sigma_log_tau;
  for(n in 1:N) {
    mu[n] = x[n, ] * beta[, jj[n]];
  }
  to_vector(z) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 1);
  for (j in 1:J) {
    mu_tau[j] = zeta * to_vector(u'[j,]);
  }
  L_Sigma_log_tau = diag_pre_multiply(L_log_tau, L_Omega_tau);
  to_vector(zeta) ~ normal(0, 1);
  L_Omega_tau ~ lkj_corr_cholesky(2);
  log_tau ~ multi_normal_cholesky(mu_tau, L_Sigma_log_tau);
  y ~ normal(mu, sigma);
}

saschagobel · November 10, 2023, 4:21pm

Figured that the non-centered parameterization pasted below might help get rid of the divergences etc. However, the line:

vector[K] log_tau = mu_tau + diag_pre_multiply(L_log_tau, L_Omega_tau) * z_2;

produces an “Ill-typed arguments supplied to infix operator +” error, which makes sense, as mu_tau is an array[] vector, not a vector.

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;                     // individual predictors
  matrix[L, J] u;                     // group predictors
  array[J] vector[L] s;               // group predictors
  vector[N] y;                        // outcomes
}

parameters {
  matrix[K, J] z;
  vector[K] z_2;
  cholesky_factor_corr[K] L_Omega;
  cholesky_factor_corr[K] L_Omega_tau;
  matrix[K, L] gamma;                        // group coeffs - location
  matrix[K, L] zeta;                         // group coeffs - scale
  real<lower=0> sigma;                       // prediction error scale
  vector<lower=0, upper=pi() / 2>[K] L_log_tau_unif;
}

transformed parameters {
  vector<lower=0>[K] L_log_tau = 2.5 * tan(L_log_tau_unif);
  array[J] vector[K] mu_tau;
  for (j in 1:J) {
    mu_tau[j] = zeta * s[j];
  }
  vector[K] log_tau = mu_tau + diag_pre_multiply(L_log_tau, L_Omega_tau) * z_2;
  matrix[K, J] beta = gamma * u + diag_pre_multiply(exp(log_tau), L_Omega) * z;
}

model {
  vector[N] mu;
  matrix[K, K] L_Sigma_log_tau;
  for(n in 1:N) {
    mu[n] = x[n, ] * beta[, jj[n]];
  }
  to_vector(z) ~ std_normal();
  z_2 ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  L_Omega_tau ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 0.5);
  to_vector(zeta) ~ normal(0, 0.5);
  y ~ normal(mu, sigma);
}

Appreciate any ideas on how to implement this/get rid of the divergences and treedepth warnings.
radon_dat.RData (4.0 KB)
Data attached, also available here https://github.com/stan-dev/example-models/blob/master/ARM/Ch.17/radon_multi_varying_coef_17.2.data.R

Topic		Replies	Views
Specification of multivariate hierarchical priors Modeling specification	1	510	June 9, 2019
Hierarchical priors - matrix of group coeffs in users guide Modeling	2	632	February 11, 2022
Prior for regression coefficients: multivariate normal vs multiple, independent univariate normal Modeling prior-choice , hierarchical-model	6	724	April 8, 2020
Converting Stan's Manual Multivariate Hierarchical Model Modeling	10	1106	March 1, 2021
SUG 1.13 ("Multivariate priors for hierarchical models"): missing prior & serious identification issue? Modeling specification	14	1271	October 22, 2021

Multivariate priors for hierarchical models - Group-level predictors for prior scale

Related topics