Prior for regression coefficients: multivariate normal vs multiple, independent univariate normal

I’m building a hierarchical regression model with both varying intercept and varying slopes.

Stan documentation has an excellent section on “Multivariate prior for hierarchical model”. However, instead of using multivariate normal as prior, I want to use multiple, independent univariate normal so that my Stan file is simpler. Specifically,

transformed parameters {
  beta0 = mu_beta0 + sigma_beta0 * beta0_raw;
  beta1 = mu_beta1 + sigma_beta1 * beta1_raw;
model {
  beta0_raw ~ normal(0, 1);
  beta1_raw ~ normal(0, 1);
  y ~ normal(beta0 + beta1 * x, sigma);

Note how beta0 and beta1 have independent, univariate normal prior (in the form of non-centered parameterization).

The later is much easier to code, especially when we also want to do non-centered parameterization.

Obviously the potential problem here is that I’m assuming that the intercept and the slopes are independent. How bad is that assumption? How to diagnose that this assumption in the prior is causing me problems?

Full model below, in case the problem I’m having is due to something else in the model other than the independent prior. It’s a hierarchical regression model with measurement error in both x and y.

  * Hierarchical regression with measurement error in x and y
  data {
    int<lower=0> N; // Number of observations
    int<lower=0> J; // Number of groups (region-vertical)
    vector[N] x;
    vector<lower=0>[N] se_x;
    vector[N] y;
    vector<lower=0>[N] se_y;
    int<lower=1,upper=J> group[N]; // Group membership of each observation
parameters {
  // regression parameters
  vector[J] beta0_raw; // non-centered parameterization
  vector[J] beta1_raw; // non-centered parameterization
  real<lower=0> sigma;
  // hyper-prior for the regression coefficients
  real mu_beta0;
  real mu_beta1;
  real<lower=0> sigma_beta0;
  real<lower=0> sigma_beta1;
  // latent values
  vector[N] x_lat_raw; // non-centered parameterization
  real mu_x_lat;
  real<lower=0> sigma_x_lat;
  vector[N] y_lat;
transformed parameters {
  // non-centered parameterization for regression coefficient
  vector[J] beta0 = mu_beta0 + sigma_beta0 * beta0_raw;
  vector[J] beta1 = mu_beta1 + sigma_beta1 * beta1_raw;
  // non-centered parameterization for x_lat
  vector[N] x_lat = exp(mu_x_lat + sigma_x_lat * x_lat_raw);
  // vectorizing the mu_yhat calculation, which is a linear combination of x
  //   (note how the corresponding beta is picked based on the group membership)
  vector[N] mu_y_lat;
  for (i in 1:N) {
    mu_y_lat[i] = beta0[group[i]] + beta1[group[i]] * x_lat[i];  
model {
  // Hierarchical prior on x_lat
  x_lat_raw ~ normal(0., 1.);
  mu_x_lat ~ normal(0., 5.);
  sigma_x_lat ~ normal(0., 5.); // practically a half normal given the lower=0 constraint
  // Regression
  x ~ normal(x_lat, se_x);
  y ~ normal(y_lat, se_y);
  y_lat ~ normal(mu_y_lat, sigma);
  // Hierarchical Prior on regression coefficients
  beta0_raw ~ normal(0, 1);
  beta1_raw ~ normal(0, 1);
  mu_beta0 ~ normal(0, 1);
  mu_beta1 ~ normal(0, 1);
  sigma_beta0 ~ normal(0, 5);
  sigma_beta1 ~ normal(0, 5);
  sigma ~ normal(0., 5);
1 Like

Hi Anh,

There are two quick ways that I would go about testing this. Firstly, in the model that includes the correlated coefficients, how strong are the correlations? If the credibility intervals for the correlation do not contain zero, then the parameters are likely correlated. This would also be supported by comparing the models using the loo statistic, if the model fits worse when assuming that the parameters are independent, then the correlation should not be removed.


Hey @andrjohns, thank you for the suggestions! Those are reasonable diagnostics, and I think they will work. However, it does require coding up the correlated model, at which point I should just go ahead and use it. (There doesn’t seem to be any downside with using the correlated model, except perhaps slower sampling because there are more parameters.)

My situation is that I’m using the independent model and see some problems (e.g. divergent transition, multi modal posterior). However, I don’t know whether I can reasonably attribute those problems to the independence assumption or not. What kind of diagnostics can we use to figure this out?

Oh gotcha. The first thing to try if you’re getting divergences with hierarchical models is using a non-centered parameterisation for these parameters, there’s a great section in the stan manual which covers it in more detail

I should have clarified that I already used non-centered parameterization (I just edited the post.) Perhaps the way I’m using the technique is wrong?

Nope that syntax looks fine. If you can post the full model then I can help more, otherwise there’s not much else I can do

1 Like

Thanks so much for offering your help! I included the full model in the original post. Perhaps it’s indeed something else that’s causing problems, not the independent prior as I suspected.