As part of a learning exercise, I’ve being trying to fit a stan model on synthetic data in R that uses correlated coefficients. My idea was to prove I could do it with synthetic data before applying it to empirical data.
I am unfortunately running into a host of divergent transitions whilst also failing to capture the correlation structure in the synthetic data.
Here is the R code I use to generate the data:
library(MASS)
library(rstan)
options(mc.cores = parallel::detectCores())
#Simulate Data
n <- 500
b0 <- 1
x1 <- rnorm(n, 0.1, 0.1)
x2 <- rnorm(n, 0.05, 0.05)
CovMat <- matrix(c(0.6, -0.4, -0.4 , 0.6),2,2)
betas <- mvrnorm(n, mu = c(1,1), CovMat)
b1 <- betas[,1]
b2 <- betas[,2]
linpred <- b0 + b1 * x1 + b2 * x2
y <- sapply(exp(linpred), function(x) rpois(1, x))
#Prep Stan Data
standata <- list(N = n, Y = y, X1 = x1, X2 = x2, K = ncol(betas))
#Run Stan Model
StanFile <- "MultivarParam.stan"
poismvmod <- stan(file = StanFile, data = standata, iter = 700)
And here is the Stan model:
data {
int<lower=1> N; //data points
int<lower=0> Y[N]; //dependent variable
real X1[N]; //predictor 1
real X2[N]; //predictor 2
int<lower=2> K; //number of corr coeffs
}
parameters {
real beta0; //intercept
vector[K] beta; //vector of correlated coeffs
vector[K] beta_mu; //vector of means for correlated coeffs
corr_matrix[K] Omega; //Correlation matrix for correlated coeffs
vector<lower=0>[K] sigma; // vector of variance for correlated coeffs
}
model {
//Priors on coeffs
sigma ~ normal(0,2);
beta_mu ~ normal(0,1);
Omega ~ lkj_corr(2);
beta ~ multi_normal(beta_mu, quad_form_diag(Omega, sigma));
//Prior on intercept
beta0 ~ normal(0, 2);
//likelihood
for (i in 1:N) {
target += poisson_log_lpmf(Y[i] | beta0 + beta[1] * X1[i] + beta[2] * X2[i]);
}
}
I could be missing something obvious or more likely it’s just my fundamental misunderstanding of how correlation structures are applied in regression. Any guidance would be much appreciated.