I have run into a mysterious phenomenon. I am using the development version of CmdstanR
(0.4.0.9000) and version 2.27.0 of cmdstan
. I am running this software on Mac OS Big Sur with an M1 machine, running version 4.1 of R
, and a development version of RStudio
.
This is the issue: the first time I compile and run the stan model, everything works fine.
If I edit the Stan
code even slightly, say modifying a single prior by increasing a standard deviation by 1 unit, and recompile and run, I get an error indicating “no chains finished successfully.”
But the oddest thing is that when I return the Stan
code to its original state, I continue to get the same warning/failure message. However, if I reboot the Macbook Pro, I am able to successfully estimate the model the modified Stan
model or the original, whichever one I compile first.
Unless you think it is helpful, I will not provide a detailed explanation of the data and model, but I am providing all the R
and Stan
code for the simplest model. (There may be questions about why this model, but that is not relevant here - I am trying to understand why I need to reboot my laptop in order to successfully fit a new model.)
Here is the Stan
model:
data {
int<lower=0> N;
int<lower=1, upper=2> a[N]; // intervention 1
int<lower=1, upper=2> b[N]; // intervention 2
int<lower=1, upper=4> ab[N]; // interaction 1 & 2
vector[N] y;
}
parameters {
vector[1] t_a_raw;
vector[1] t_b_raw;
vector[3] t_ab_raw;
real<lower=0> sigma;
}
transformed parameters {
// constrain parameters to sum to 0
vector[2] t_a = append_row(t_a_raw, -t_a_raw);
vector[2] t_b = append_row(t_b_raw, -t_b_raw);
vector[4] t_ab = append_row(t_ab_raw, -sum(t_ab_raw));
// yhat
vector[N] yhat;
for (i in 1:N){
yhat[i] = t_a[a[i]] + t_b[b[i]] + t_ab[ab[i]];
}
}
model {
sigma ~ cauchy(0, 1);
t_a ~ normal(0, 3);
t_b ~ normal(0, 3);
t_ab ~ normal(0, 3);
y ~ normal(yhat, sigma);
}
And here is the R
code which generates the data and calls Stan
:
library(simstudy)
library(cmdstanr)
library(caret)
library(ggpubr)
b_1 <- c(-3, 3)
b_2 <- c(-1, 1)
b_12 <- matrix(c(-.5/3, -.5/3, -.5/3, .5), nrow = 2)
d1 <- defData(varname = "a", formula = ".5;.5", variance = "1;2", dist = "categorical")
d1 <- defData(d1, varname = "b", formula = ".5;.5", variance = "1;2", dist = "categorical")
d1_a <- defDataAdd(varname = "y", formula = "mu", variance = 4, dist = "normal")
set.seed(107)
dd <- genData(1000, d1)
dd[, mu := b_1[a] + b_2[b] + b_12[a, b], keyby = id]
dd <- addColumns(d1_a, dd)
dsum <- dd[, mean(y), keyby = .(a, b)]
p1 <- ggplot(data = dd, aes(x = factor(a), y = y)) +
geom_jitter(height = 0, width = .1) +
facet_grid(.~b) +
theme(panel.grid = element_blank()) +
ylim(-12, 12)
p2 <- ggplot(data = dsum, aes(x = factor(a), y = V1)) +
geom_point(aes(color = factor(b)), size = 2) +
geom_line(aes(group= factor(b), color = factor(b))) +
theme(panel.grid = element_blank()) +
ylim(-5, 5)
ggarrange(p1, p2)
summary(lm(y~factor(a)*factor(b), data = dd))
# Bayesian analysis
dt_to_list <- function(dx) {
dx[, a_f := factor(a)]
dx[, b_f := factor(b)]
dv <- dummyVars(~ a_f:b_f , data = dx, n = c(2,2))
dp <- predict(dv, dx )
N <- nrow(dx) ## number of observations
a <- dx[, a] ## number of levels of outcome
b <- dx[, b] ## individual outcome
ab <- as.vector(dp %*% c(1:4))
y <- dx[, y]
list(N=N, a=a, b=b, ab=ab, y=y)
}
mod <- cmdstan_model("Programs/two_factor.stan")
fit <- mod$sample(
data = dt_to_list(dd),
seed = 27261,
refresh = 0,
chains = 1L,
parallel_chains = 1L,
iter_warmup = 500,
iter_sampling = 2500
)
fit$summary(variables = c("t_a", "t_b", "t_ab"))
Any insights would be much appreciated.