Modeling correlation between parameters

I’m new to stan and have a question that is probably very simple but I haven’t been able to work out on my own.

I have data with repeated observations of units across multiple measures and want to measure unit-specific means for each of those measures. Some of these variables are correlated and data on each comes at irregular intervals. When I get new information from one variable I want to use it to update the entire matrix of unit-specific means.

This first set of code simulates two correlated variables that are similar to what I’m working with.

library(MASS)
library(rstan)

a <- -.1
b <- .05
sigma_mu_1 <- .2
sigma_1 <- 1.5
sigma_mu_2 <- .1
sigma_2 <- .75
rho = .5

##Make correlation matrix
mu <- c(a,b)
sigmas <- c(sigma_mu_1, sigma_mu_2)
Rho <- matrix(c(1, rho, rho, 1), nrow = 2)
Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)

n_units <- 10

set.seed(1000)

##sample unit effects
mu_matrix <- mvrnorm(n_units, mu, Sigma)

mu_1 <- mu_matrix[,1]
mu_2 <- mu_matrix[,2]

n_obs <- 100

##Generate data
team_id <- rep(1:n_units, each = n_obs)
mean_1 <- mu_1[team_id]
mean_2 <- mu_2[team_id]
y_1 <- rnorm(n_units * n_obs, mean_1, sigma_1)
y_2 <- rnorm(n_units * n_obs, mean_2, sigma_2)

dat <- list(
  N_1 = n_obs * n_units,
  N_2 = n_obs * n_units,
  I = n_units,
  ii = team_id,
  jj = team_id,
  y_1 = y_1,
  y_2 = y_2
)

This stan code estimates the effects separately and everything samples fine but all of my attempts to build in correlation between mu_1 and mu_2 have gone poorly. What is the appropriate way to model this correlation?

data{
  int<lower = 0> N_1;
  int<lower = 0> N_2; 
  int<lower = 1> I;
  int<lower = 0, upper = I> ii[N_1];
  int<lower = 0, upper = I> jj[N_2]; 
  real y_1[N_1];
  real y_2[N_2];
}
parameters{
  vector[I] mu_1_raw;
  vector[I] mu_2_raw;
  real alpha_1; 
  real alpha_2; 
  real<lower = 0> sigma_1;
  real<lower = 0> sigma_2;
  real<lower = 0> sigma_mu_1;
  real<lower = 0> sigma_mu_2;
}
transformed parameters{
  vector[I] mu_1 = mu_1_raw * sigma_mu_1;
  vector[I] mu_2 = mu_2_raw * sigma_mu_2;
}
model{
  alpha_1 ~ normal(0, .1);
  alpha_2 ~ normal(0, .1);
  sigma_1 ~ normal(0,2);
  sigma_2 ~ normal(0,2);
  sigma_mu_1 ~ std_normal();
  sigma_mu_2 ~ std_normal();
  mu_1_raw ~ std_normal();
  mu_2_raw ~ std_normal();

  y_1 ~ normal(alpha_1 + mu_1[ii], sigma_1);
  y_2 ~ normal(alpha_2 + mu_2[jj], sigma_2);
}

You want to use multi_normal (or multi_normal_cholesky), which in turn necessitates a bit of reshaping of your parameters. You have your model set up with non-centered parameterization, but it’s easier to follow if I first show the centered version, with variables named a bit more informatively (if less succinctly):

parameters{

  vector[2] mean_of_unit_means; 
  vector<lower=0>[2] sd_of_unit_means ; 
  corr_matrix[2] cor_mat ; 

  matrix[I,2] unit_means ;

  vector<lower=0>[2] measurement_noise ;

}
transformed parameters{
  //combine the sds & correlation matrix to a covariance matrix
  matrix[2,2] cov_mat = quad_form_diag( cor_mat, sd_of_unit_means) ;
}
model{
  mean_of_unit_means ~ normal(0,.1) ;
  measurement_noise  ~ normal(0,2);
  sd_of_unit_means ~ std_normal();
  cor_mat ~ lkj_corr(3) ; // note: modify this to match your prior expectations for the correlation

  for(i in 1:I){
    unit_means[i,] ~ multi_normal(
        mean_of_unit_means
        , cov_mat
    ) ;
  }

  y_1 ~ normal(unit_means[ii,1], measurement_noise[1]);
  y_2 ~ normal(unit_means[jj,2], measurement_noise[2]);
}

Now it turns out to be possible to avoid the loop in the model block by simply changing the shape of the unit_means parameter, in which case it also makes sense to compute the covariance matrix in-place so it’s not saved:

parameters{

  vector[2] mean_of_unit_means; 
  vector<lower=0>[2] sd_of_unit_means ; 
  corr_matrix[2] cor_mat ; 

  vector[2] unit_means[I] ;

  vector<lower=0>[2] measurement_noise ;

}
model{
  mean_of_unit_means ~ normal(0,.1) ;
  measurement_noise  ~ normal(0,2);
  sd_of_unit_means ~ std_normal();
  cor_mat ~ lkj_corr(3) ; // note: modify this to match your prior expectations for the correlation

  unit_means ~ multi_normal(
      mean_of_unit_means
      , quad_form_diag( cor_mat, sd_of_unit_means) 
  ) ;

  y_1 ~ normal(unit_means[ii][1], measurement_noise[1]);
  y_2 ~ normal(unit_means[jj][2], measurement_noise[2]);
}

Now, for the non-centered version, the manual suggests that using the Cholesky factor representation of the correlation matrix for better sampling performance. I think it should be possible to similarly use the Cholesky factor representation in the centered model above, so if you get divergences with the non-centered but not the centered version and you find the centered version slow, try adding the Cholesky stuff below to the model above. Report back if you try this, as I’d be curious to hear how it goes.

Anyway, here’s the non-centered with Cholesky representation:

parameters{

  row_vector[2] mean_of_unit_means;
  vector<lower=0>[2] sd_of_unit_means ;
  cholesky_factor_corr[2] chol_cor_mat ;

  matrix[I,2] unit_means_z ;

  vector<lower=0>[2] measurement_noise ;

}
transformed parameters{
  matrix[I,2] unit_means = ( //using some idiosyncratic whitespace here for (imo) better clarity
    rep_matrix(mean_of_unit_means,I)
    + (
        unit_means_z
        *diag_post_multiply(chol_cor_mat,sd_of_unit_means)
      )
  );
}
model{
  mean_of_unit_means ~ normal(0,.1) ;
  measurement_noise  ~ normal(0,2);
  sd_of_unit_means ~ std_normal();
  chol_cor_mat ~ lkj_corr_cholesky(3) ; // note: modify this to match your prior expectations for the correlation

  to_vector(unit_means_z) ~ std_normal() ;

  y_1 ~ normal(unit_means[ii,1], measurement_noise[1]);
  y_2 ~ normal(unit_means[jj,2], measurement_noise[2]);
}

Thank you for your help! I will let you know how things go.