Multivariate analysis

Please share your Stan program and accompanying data if possible.


Hi everyone,
I’m new to working with stan and I am trying to fit this multivariate model. I saw one article online on how to analyse “Bayesian Varying Effects Models in R and Stan”. Bayesian Varying Effects Models in R and Stan - Will Hipson.

The data was simulated this way:

theme_set(theme_bw(base_size = 14)) # setting ggplot2 theme

library(mvtnorm)

set.seed(12407) # for reproducibility

# unstandardized means and sds
real_mean_y <- 4.25
real_mean_x <- 3.25
real_sd_y <- 0.45
real_sd_x <- 0.54

N <- 30 # number of people
n_days <- 7 # number of days
total_obs <- N * n_days

sigma <- 1 # population sd
beta <- c(0, 0.15) # average intercept and slope
sigma_p <- c(1, 1) # intercept and slope sds
rho <- -0.36 # covariance between intercepts and slopes

cov_mat <- matrix(c(sigma_p[1]^2, sigma_p[1] * sigma_p[2] * rho, sigma_p[1] * sigma_p[2] * rho, sigma_p[2]^2), nrow = 2)
beta_p <- rmvnorm(N, mean = beta, sigma = cov_mat) # participant intercepts and slopes

x <- matrix(c(rep(1, N * n_days), rnorm(N * n_days, 0, 1)), ncol = 2) # model matrix
pid <- rep(1:N, each = n_days) # participant id

sim_dat <- map_dfr(.x = 1:(N * n_days), ~data.frame(
  mu = x[.x, 1] * beta_p[pid[.x], 1] + x[.x, 2] * beta_p[pid[.x], 2],
  pid = pid[.x],
  x = x[.x, 2]
))

sim_dat$y <- rnorm(210, sim_dat$mu, sigma) # creating observed y from mu and sigma

dat <- sim_dat %>%
  select(-mu) %>% # removing mu
  mutate(y = real_mean_y + (y * real_sd_y), # unstandardize
         x = real_mean_x + (x * real_sd_x)) # unstandardize

stan_dat3 <- list(
  N_obs = nrow(dat),
  N_pts = max(as.numeric(dat$pid)),
  K = 2, # intercept + slope
  pid = as.numeric(dat$pid),
  x = matrix(c(rep(1, nrow(dat)), (dat$x - mean(dat$x)) / sd(dat$x)), ncol = 2), # z-score for x
  y = (dat$y - mean(dat$y)) / sd(dat$y) # z-score for y
)

and here is the stan model saved as mod_c.stan:

data {
  int N_obs; // number of observations
  int N_pts; // number of participants
  int K; // number of predictors + intercept
  int pid[N_obs]; // participant id vector
  matrix[N_obs, K] x; // matrix of predictors
  real y[N_obs]; // y vector
}

parameters {
  matrix[K, N_pts] z_p; // matrix of intercepts and slope
  vector<lower=0>[K] sigma_p; // sd for intercept and slope
  vector[K] beta; // intercept and slope hyper-priors
  cholesky_factor_corr[K] L_p; // Cholesky correlation matrix
  real<lower=0> sigma; // population sigma
}

transformed parameters {
  matrix[K, N_pts] z; // non-centered version of beta_p
  z = diag_pre_multiply(sigma_p, L_p) * z_p; 
}

model {
  vector[N_obs] mu;
  
  // priors
  beta ~ normal(0, 1);
  sigma ~ exponential(1);
  sigma_p ~ exponential(1);
  L_p ~ lkj_corr_cholesky(2);
  to_vector(z_p) ~ normal(0, 1);
  
  // likelihood
  for(i in 1:N_obs) {
    mu[i] = beta[1] + z[1, pid[i]] + (beta[2] + z[2, pid[i]]) * x[i, 2];
  }
  y ~ normal(mu, sigma);
}

generated quantities {
  matrix[2, 2] Omega;
  Omega = multiply_lower_tri_self_transpose(L_p);
}

I used this to run the model

stan_fit3 <- stan(file = "mod_c.stan",
                  data = stan_dat3,
                  chains = 4, cores = 4)

stan_fit3

and I got the same result.

Now here is my application: I divided the dataset into two parts (105 each)[this was done by plotting the dataset and using the plot to divide] and I used this model to run the first part (first data) (where my pid = rep(1:N/2, each = n_days=7) (I used N/2 =15 since I divided the data into two) ) and also run the second part. In the model, K=2, N_obs = nrow(dat)=105, N_pts = max(as.numeric(dat$pid))=15.
For the first part, I got my random effect correlation to be

\bigg(\begin{matrix} 1 & -0.40\\ -0.40& 1 \end{matrix}\bigg)
For the second part also, I got my random effect correlation to be
\bigg(\begin{matrix} 1 & -0.36\\ -0.36& 1 \end{matrix}\bigg)
Now using the result from the two dataset, I simulated another dataset by using correlation 0.6 and -0.6 to join the random effect correlation from the two dataset together and this gives a 4 by 4 matrix: Like this
\begin{pmatrix} 1 & 0.6&-0.40&-0.6\\ 0.6& 1&-0.6&-0.36\\ -0.4& -0.6&1&0.6\\ -0.6& -0.36&0.6&1 \end{pmatrix}
In this case, my random effect follows MVNormal(c(\beta_0[1],\beta_0[2],\beta_1[1],\beta_1[2]), \Sigma), where \beta_0[1] and \beta_1[1] are from data 1, \beta_0[2] and \beta_1[2] are from data 2 and \Sigma is the sigma of the random effect after applying the correlation above.

Here is my stan code using the simulated dataset:

data {
  int N_obs; // number of observations
  int N_pts; // number of participants
 int<lower=1>J;
  int<lower=1,upper=J>dam[N_obs];
  int K; // number of predictors + intercept
  int pid[N_obs]; // participant id vector
  matrix[N_obs, K] x; // matrix of predictors
  real y[N_obs]; // y vector
}

parameters {
  matrix[K, N_pts] z_p; // matrix of intercepts and slope
  vector<lower=0>[K] sigma_p; // sd for intercept and slope
  vector[K] mu1; // intercept and slope hyper-priors
  cholesky_factor_corr[K] L_p; // Cholesky correlation matrix
  vector[2] sigma; // population sigma
}

transformed parameters {
  matrix[K, N_pts] z; // non-centered version of beta_p
  z = diag_pre_multiply(sigma_p, L_p) * z_p; 
}

model {
vector[2] beta1;
vector[2] beta2;
  vector[N_obs] mu;
  
  // priors
  mu1~ normal(0, 1);
  sigma ~ exponential(1);
  sigma_p ~ exponential(1);
  L_p ~ lkj_corr_cholesky(2);
  to_vector(z_p) ~ normal(0, 1);
  
  // likelihood
  for(i in 1:N_obs) {
          beta1[1] = mu1[1]+z[1,pid[i]];
      beta1[2] = mu1[2]+z[2,pid[i]];
      beta2[1] =mu1[3]+z[3,pid[i]]; 
      beta2[2]=mu2[4]+z[4,pid[i]]; 
    mu[i] = beta1[dam[i]]+ (beta2[dam[i]]) * x[i, 2];
  }
  y[i] ~ normal(mu, sigma[dam[i]]);
}

generated quantities {
  matrix[4,4] Omega;
  Omega = multiply_lower_tri_self_transpose(L_p);
}

where my Dam is a factor of (1,1,1,…,2,2,2) for “1” represent data 1 and “2” represent data 2, dam=as.integer(Dam) and J=2. Also pid = rep(1:N"=30", each =7) , K=4, N_obs = nrow(dat)=210, N_pts = max(as.numeric(dat$pid))=30.

Using my simulated dataset, I expect my code to capture the 0.6 and -0.6 correlation but I got 0.002 and -0.04 instead. Is there anything I’m not doing right? Please help.

Thank you

1 Like

Hi,
sorry for not getting to you earlier. It appears that the second model is only ever using a single datapoint:

y[i] ~ normal(mu, sigma[dam[i]]);

should probably have been inside the for loop. It is actually surprising that the model compiles at all (actually, if ti compiles it might be a bug in Stan compiler).

Since you seem to not be using almost any data to fit the model, the correlations would be expected stay almost at their prior location, which is close to 0.

Does that make sense? And does the problem disappear after fixing the likelihood?

Thank you so much for replying. I corrected the loop so that y[i] will be inside the loop. That makes sense to me now. I noticed that after correcting the error, my correlation was reduced. For example, where I’m expecting the correlation to be 0.9, I got 0.7.

Why are you expecting it to be 0.9? It is because you simulated the data with correlation of 0.9? It is also usually better to look at some wider posterior intervals than just at point estimates. Note that the LKJ prior actually puts quite a lot of prior probability on low correlations, so maybe this is to be an expected amount of shrinkage? The best way to find out is to test with data simulated exactly from the prior, so instead of working with a fixed correlation matrix, you would draw the matrix from the LKJ(2) distribution and see if your posterior intervals cover the true values well…

Does that make sense?

1 Like

Yes it does. Thank you so much for your reply. I’m so grateful.