I’ve noticed a perplexing trend with the rhat in my simulation that I would really like to decipher. It seems that for a very small trial with a binomial outcome, the rhat for the last specified treatment effect almost always is very close to 1, whereas the other two treatment effects will show the convergence issues.
For example, I can simulate a dataset where all responses are 0, (5 observations per cluster, 5 clusters per group, and 4 groups) and when modelling the treatment effects (beta) the Rhat will still be 1 for beta_trt[3] but high for beta_trt[1] and beta_trt[2].
For context, definitely not preferable to run a model with this little data, but I am trying to figure out how small a clustered trial can be until the model ‘breaks’, and how it breaks.
I’ve tried this with and without a QR decomposition with the same result.
If anyone can think of why this would occur I’d really appreciate the insight.
An example of the output I’m getting:
Here is the code I’ve used to run the stan model, (via R) and a generated dataset. If I haven’t uploaded the data in a way that works let me know and I’ll try another method.
example_data.csv (873 Bytes)
resp <- as.vector(example_data[,4])
N_obs <- dim(example_data)[1]
N_site <- length(unique(example_data$site))
N_trt_groups <- length(unique(example_data$trt))
data <- list(N_obs = N_obs, N_site = N_site, N_trt_groups = N_trt_groups,
site = example_data$site, trt = as.numeric(example_data$trt), resp = resp)
mod$sample(
data = data,
init = 0,
iter_warmup = 250,
iter_sampling = 250,
chains = 4,
parallel_chains = 1,
adapt_delta = 0.99,
refresh = 0,
max_treedepth=12)
data {
int<lower=1> N_obs; //number of observations
int<lower=1> N_site; //number of sites
int<lower=1> N_trt_groups; //number of treatment groups
array[N_obs] int site; //site ids
array[N_obs] int trt; //trt group
array[N_obs] int resp; //outcome
}
transformed data {
matrix[N_obs, N_trt_groups-1] X_matrix = rep_matrix(0, N_obs, N_trt_groups-1);
matrix[N_obs, N_site] Z_matrix = rep_matrix(0, N_obs, N_site);
for (i in 1:N_obs){
for (j in 2:N_trt_groups){
if (j==trt[i]) X_matrix[i,j-1] = 1;
}
for(k in 1:N_site)
if (k==site[i]) Z_matrix[i,k] = 1;
}
}
parameters {
real b0;
vector[N_trt_groups-1] beta_trt; //beta coefficients for each treatment group
vector[N_site] alpha_site_raw; //random effect coefficients for site
cholesky_factor_corr[N_site] lkj_corr; //random effect correlation
}
transformed parameters {
vector[N_site] b0_site = b0 + 0.2*lkj_corr * alpha_site_raw;//implies: random intercept for site is sampled from multivariate normal with mean 0 and 100 SD
}
model {
b0~normal(0,2);
beta_trt[N_trt_groups-1]~normal(0,2);
lkj_corr ~ lkj_corr_cholesky(5);
alpha_site_raw~std_normal();
resp ~ bernoulli_logit( X_matrix * beta_trt + Z_matrix*b0_site);
}
generated quantities {
vector[N_trt_groups-1] pred_prob_trt;
pred_prob_trt = inv_logit(mean(b0_site) + beta_trt);
}