 # Correlated coefficients example - what's going wrong?

As part of a learning exercise, I’ve being trying to fit a stan model on synthetic data in R that uses correlated coefficients. My idea was to prove I could do it with synthetic data before applying it to empirical data.

I am unfortunately running into a host of divergent transitions whilst also failing to capture the correlation structure in the synthetic data.

Here is the R code I use to generate the data:

``````library(MASS)
library(rstan)
options(mc.cores = parallel::detectCores())

#Simulate Data
n <- 500
b0 <- 1
x1 <- rnorm(n, 0.1, 0.1)
x2 <- rnorm(n, 0.05, 0.05)
CovMat <- matrix(c(0.6, -0.4, -0.4 , 0.6),2,2)
betas <- mvrnorm(n, mu = c(1,1), CovMat)
b1 <- betas[,1]
b2 <- betas[,2]
linpred <- b0 + b1 * x1 + b2 * x2
y <- sapply(exp(linpred), function(x) rpois(1, x))

#Prep Stan Data
standata <- list(N = n, Y = y, X1 = x1, X2 = x2, K = ncol(betas))

#Run Stan Model
StanFile <- "MultivarParam.stan"
poismvmod <- stan(file = StanFile, data = standata, iter = 700)
``````

And here is the Stan model:

``````data {
int<lower=1> N; //data points
int<lower=0> Y[N]; //dependent variable
real X1[N]; //predictor 1
real X2[N]; //predictor 2
int<lower=2> K; //number of corr coeffs
}
parameters {
real beta0; //intercept
vector[K] beta; //vector of correlated coeffs
vector[K] beta_mu; //vector of means for correlated coeffs
corr_matrix[K] Omega; //Correlation matrix for correlated coeffs
vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
model {
//Priors on coeffs
sigma ~ normal(0,2);
beta_mu ~ normal(0,1);
Omega ~ lkj_corr(2);
//Prior on intercept
beta0 ~ normal(0, 2);
//likelihood
for (i in 1:N) {
target += poisson_log_lpmf(Y[i] | beta0 + beta * X1[i] + beta * X2[i]);
}
}
``````

I could be missing something obvious or more likely it’s just my fundamental misunderstanding of how correlation structures are applied in regression. Any guidance would be much appreciated.

If you are going to do a model like this, you should usually utilize a non-centered parameterization where the coefficients become transformed parameters

``````data {
int<lower=1> N; //data points
int<lower=0> Y[N]; //dependent variable
real X1[N]; //predictor 1
real X2[N]; //predictor 2
int<lower=2> K; //number of corr coeffs
}
parameters {
real beta0; //intercept
vector[K] beta_raw; // primitive vector of correlated coeffs
vector[K] beta_mu; //vector of means for correlated coeffs
cholesky_factor_corr[K] L; // Cholesky factor of correlation matrix for correlated coeffs
vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
transformed parameters {
vector[K] beta = beta_mu + sigma .* (L * beta_raw);
}
model {
//Priors on coeffs
sigma ~ normal(0,2);
beta_mu ~ normal(0,1);
L ~ lkj_corr_cholesky(2);
beta_raw ~ normal(0, 1); // implies:
// beta ~ multi_normal(beta_mu, quad_form_diag(L * L', sigma));
//Prior on intercept
beta0 ~ normal(0, 2);
//likelihood
target += poisson_log_lpmf(Y | beta0 + beta * X1 + beta * X2);
}
``````

where I have also eliminated the loop in the log-likelihood.

2 Likes

Thanks Ben, I’m still struggling a bit, mainly because I’m not getting near the parameters used to generate the data with.

When I do:

``````L <- extract(poismvmod, pars = "L")\$L
cor_b <- apply(L, 1, function(x) tcrossprod(x)[1, 2])
print(signif(quantile(cor_b , probs = c(0.025, 0.5, 0.975)), 2))
``````

I get values around 0, when I’m looking for about -0.66 based on the data.

What am I missing?

Seems to me that the data generation code and the Stan model do not match - the data generation generates different beta coefficients for each observation while in the Stan model there is only one parameter beta and one parameter beta common to all observations.

2 Likes

Thanks Juho, this reveals my misunderstanding of how correlation between coefficients work. How do I simulate data with correlated coefficients when I only have one value of each coefficient?

Unfortunately I’m not sure what you mean by correlated coefficients. Or, in what sense are the coefficients correlated if there is only one value of each? Possibly prior correlation, but then one likely cannot expect to recover the data-generating correlation parameter from in-a-sense one datapoint.

2 Likes

As an example, say you want to evaluate how the size of a house and its number of bedrooms affect the price of a house. Intuitively, since larger house tend to have more bedrooms, you’d expect that if we gave more weight to the effect of house size, then we will have to give less weight to the bedrooms for the house price to remain the same.

As a result I’d imagine the samples for the coefficient estimates to be negatively correlated and I was wondering how to capture this belief in my model. As I think you allude to in your answer, I can do this through prior correlation but I’m struggling to make a reproducible example of this concept to really test my understanding.

If by samples for the coefficient estimates you mean the MCMC samples from the posterior distribution (ie the Stan results), this should not need any modeling of correlations. If you set up a model like `price ~ normal(mu + beta*size + beta*bedrooms,noise)` , put independent priors over the `beta` coefficients, generate data where size correlates positively with bedrooms and price with both, you should get a posterior distribution where the `beta`s are negatively correlated.

1 Like

Thanks Juho, that makes sense. Would you mind therefore explaining where correlation matrices are useful, perhaps with an example? Or what kind of synthetic data I could produce to then test a model with a correlation matrix on?
I’m just struggling to think about where I should consider using them.

1 Like

I was browsing the forum and trying to recreate the example model by @bgoodri and Stan has apparently changed over the past three years (2018 to 2021) with respect to scalar and vector multiplication because `rstan` version 2.21 did not like `beta0 + beta * X1 + beta * X2`.

I vectorized (and “matrixied”) the model, but removed the `b0` coefficient in the process. Here is a working version of the model (save in `demo.stan` to use with my example R code):

``````data {
int<lower=1> N; //data points
int<lower=0> Y[N]; //dependent variable
int<lower=2> K; //number of corr coeffs
matrix[N, K] X; // predictor matrix

}
parameters {
vector[K] beta_raw; // primitive vector of correlated coeffs
vector[K] beta_mu; //vector of means for correlated coeffs
cholesky_factor_corr[K] L; // Cholesky factor of correlation matrix for correlated coeffs
vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
transformed parameters {
vector[K] beta = beta_mu + sigma .* (L * beta_raw);
}
model {
//Priors on coeffs
sigma ~ normal(0,2);
beta_mu ~ normal(0,1);
L ~ lkj_corr_cholesky(2);
beta_raw ~ normal(0, 1); // implies:
// beta ~ multi_normal(beta_mu, quad_form_diag(L * L', sigma));
//likelihood
target += poisson_log_lpmf(Y | X * beta);
}
``````

And R code to simulate data that works:

``````library(MASS)
library(rstan)
options(mc.cores = parallel::detectCores())

#Simulate Data
set.seed(123)
n <- 500
b0 <- 1
X <- cbind(x0 = rep(b0, n),
x1 = rnorm(n, 0.1, 0.1),
x2 = rnorm(n, 0.05, 0.05))

CovMat <- matrix(c(0.6, -0.4, -0.4 , 0.6),2,2)
betas <- cbind(rep(1, n), mvrnorm(n, mu = c(1,1), CovMat))
cor(betas)

linpred <- X * betas

y <- rpois(n = n, exp(rowSums(linpred)))

#Prep Stan Data
standata <- list(N = n, Y = y, X = X, K = ncol(betas))

demo_stan <- stan_model("demo.stan")

demo_fit <- rstan::sampling(demo_stan, standata)
demo_fit
``````