Correlated coefficients example - what's going wrong?

As part of a learning exercise, I’ve being trying to fit a stan model on synthetic data in R that uses correlated coefficients. My idea was to prove I could do it with synthetic data before applying it to empirical data.

I am unfortunately running into a host of divergent transitions whilst also failing to capture the correlation structure in the synthetic data.

Here is the R code I use to generate the data:

library(MASS)
library(rstan)
options(mc.cores = parallel::detectCores())

#Simulate Data
n <- 500
b0 <- 1
x1 <- rnorm(n, 0.1, 0.1)
x2 <- rnorm(n, 0.05, 0.05)
CovMat <- matrix(c(0.6, -0.4, -0.4 , 0.6),2,2)
betas <- mvrnorm(n, mu = c(1,1), CovMat)
b1 <- betas[,1]
b2 <- betas[,2]
linpred <- b0 + b1 * x1 + b2 * x2
y <- sapply(exp(linpred), function(x) rpois(1, x))

#Prep Stan Data
standata <- list(N = n, Y = y, X1 = x1, X2 = x2, K = ncol(betas))

#Run Stan Model
StanFile <- "MultivarParam.stan"
poismvmod <- stan(file = StanFile, data = standata, iter = 700)

And here is the Stan model:

data {
  int<lower=1> N; //data points
  int<lower=0> Y[N]; //dependent variable
  real X1[N]; //predictor 1
  real X2[N]; //predictor 2
  int<lower=2> K; //number of corr coeffs
}
parameters {
  real beta0; //intercept
  vector[K] beta; //vector of correlated coeffs
  vector[K] beta_mu; //vector of means for correlated coeffs
  corr_matrix[K] Omega; //Correlation matrix for correlated coeffs
  vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
model {
  //Priors on coeffs
  sigma ~ normal(0,2);
  beta_mu ~ normal(0,1);
  Omega ~ lkj_corr(2);
  beta ~ multi_normal(beta_mu, quad_form_diag(Omega, sigma));
  //Prior on intercept
  beta0 ~ normal(0, 2);
  //likelihood
  for (i in 1:N) {
      target += poisson_log_lpmf(Y[i] | beta0 + beta[1] * X1[i] + beta[2] * X2[i]);
  }
}

I could be missing something obvious or more likely it’s just my fundamental misunderstanding of how correlation structures are applied in regression. Any guidance would be much appreciated.

If you are going to do a model like this, you should usually utilize a non-centered parameterization where the coefficients become transformed parameters

data {
  int<lower=1> N; //data points
  int<lower=0> Y[N]; //dependent variable
  real X1[N]; //predictor 1
  real X2[N]; //predictor 2
  int<lower=2> K; //number of corr coeffs
}
parameters {
  real beta0; //intercept
  vector[K] beta_raw; // primitive vector of correlated coeffs
  vector[K] beta_mu; //vector of means for correlated coeffs
  cholesky_factor_corr[K] L; // Cholesky factor of correlation matrix for correlated coeffs
  vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
transformed parameters {
  vector[K] beta = beta_mu + sigma .* (L * beta_raw);
}
model {
  //Priors on coeffs
  sigma ~ normal(0,2);
  beta_mu ~ normal(0,1);
  L ~ lkj_corr_cholesky(2);
  beta_raw ~ normal(0, 1); // implies:
  // beta ~ multi_normal(beta_mu, quad_form_diag(L * L', sigma));
  //Prior on intercept
  beta0 ~ normal(0, 2);
  //likelihood
  target += poisson_log_lpmf(Y | beta0 + beta[1] * X1 + beta[2] * X2);
}

where I have also eliminated the loop in the log-likelihood.

2 Likes

Thanks Ben, I’m still struggling a bit, mainly because I’m not getting near the parameters used to generate the data with.

When I do:

L <- extract(poismvmod, pars = "L")$L
cor_b <- apply(L, 1, function(x) tcrossprod(x)[1, 2])
print(signif(quantile(cor_b , probs = c(0.025, 0.5, 0.975)), 2))

I get values around 0, when I’m looking for about -0.66 based on the data.

What am I missing?

Seems to me that the data generation code and the Stan model do not match - the data generation generates different beta coefficients for each observation while in the Stan model there is only one parameter beta[1] and one parameter beta[2] common to all observations.

2 Likes

Thanks Juho, this reveals my misunderstanding of how correlation between coefficients work. How do I simulate data with correlated coefficients when I only have one value of each coefficient?

Unfortunately I’m not sure what you mean by correlated coefficients. Or, in what sense are the coefficients correlated if there is only one value of each? Possibly prior correlation, but then one likely cannot expect to recover the data-generating correlation parameter from in-a-sense one datapoint.

2 Likes

As an example, say you want to evaluate how the size of a house and its number of bedrooms affect the price of a house. Intuitively, since larger house tend to have more bedrooms, you’d expect that if we gave more weight to the effect of house size, then we will have to give less weight to the bedrooms for the house price to remain the same.

As a result I’d imagine the samples for the coefficient estimates to be negatively correlated and I was wondering how to capture this belief in my model. As I think you allude to in your answer, I can do this through prior correlation but I’m struggling to make a reproducible example of this concept to really test my understanding.

If by samples for the coefficient estimates you mean the MCMC samples from the posterior distribution (ie the Stan results), this should not need any modeling of correlations. If you set up a model like price ~ normal(mu + beta[1]*size + beta[2]*bedrooms,noise) , put independent priors over the beta coefficients, generate data where size correlates positively with bedrooms and price with both, you should get a posterior distribution where the betas are negatively correlated.

1 Like

Thanks Juho, that makes sense. Would you mind therefore explaining where correlation matrices are useful, perhaps with an example? Or what kind of synthetic data I could produce to then test a model with a correlation matrix on?
I’m just struggling to think about where I should consider using them.

1 Like

I was browsing the forum and trying to recreate the example model by @bgoodri and Stan has apparently changed over the past three years (2018 to 2021) with respect to scalar and vector multiplication because rstan version 2.21 did not like beta0 + beta[1] * X1 + beta[2] * X2.

I vectorized (and “matrixied”) the model, but removed the b0 coefficient in the process. Here is a working version of the model (save in demo.stan to use with my example R code):

data {
  int<lower=1> N; //data points
  int<lower=0> Y[N]; //dependent variable
  int<lower=2> K; //number of corr coeffs
  matrix[N, K] X; // predictor matrix

}
parameters {
  vector[K] beta_raw; // primitive vector of correlated coeffs
  vector[K] beta_mu; //vector of means for correlated coeffs
  cholesky_factor_corr[K] L; // Cholesky factor of correlation matrix for correlated coeffs
  vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
transformed parameters {
  vector[K] beta = beta_mu + sigma .* (L * beta_raw);
 }
model {
  //Priors on coeffs
  sigma ~ normal(0,2);
  beta_mu ~ normal(0,1);
  L ~ lkj_corr_cholesky(2);
  beta_raw ~ normal(0, 1); // implies:
    // beta ~ multi_normal(beta_mu, quad_form_diag(L * L', sigma));
  //likelihood
  target += poisson_log_lpmf(Y | X * beta);
}

And R code to simulate data that works:

library(MASS)
library(rstan)
options(mc.cores = parallel::detectCores())

#Simulate Data
set.seed(123)
n <- 500
b0 <- 1
X <- cbind(x0 = rep(b0, n),
           x1 = rnorm(n, 0.1, 0.1),
           x2 = rnorm(n, 0.05, 0.05))

CovMat <- matrix(c(0.6, -0.4, -0.4 , 0.6),2,2)
betas <- cbind(rep(1, n), mvrnorm(n, mu = c(1,1), CovMat))
cor(betas)

linpred <- X * betas

y <- rpois(n = n, exp(rowSums(linpred)))

#Prep Stan Data
standata <- list(N = n, Y = y, X = X, K = ncol(betas))

demo_stan <- stan_model("demo.stan")

demo_fit <- rstan::sampling(demo_stan, standata)
demo_fit