The problem is the two bugs in your R code:
xobs <- runif(n, -3, 3) + rnorm(n ,sigma)
yobs <- slope * xobs + intercept + rnorm(n ,sigma)
The first problem is that the rnorm
here uses sigma
as the location, not the scale.
The second problem is that yobs
should be based on x
, not xobs
.
Going forward, you probably don’t want the same error scale for the measurement error (x_obs
) and observations (y_obs
).
Here’s an R program to simulate:
N <- 1000
sigma <- 0.5
alpha <- 1.3
beta <- -0.4
x <- runif(N, -3, 3)
y_obs <- rnorm(N, alpha + beta * x, sigma)
x_obs <- rnorm(N, x, sigma)
library('cmdstanr')
model <- cmdstan_model('foo.stan')
fit <- model$sample(data = list(N = N, x_obs = x_obs, y_obs = y_obs), parallel_chains=4)
print(fit$summary(variables = c("alpha", "beta", "sigma")))
and here’s the matching Stan model:
data{
int<lower=0> N;
array[N] real x_obs;
array[N] real y_obs;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
vector[N] x;
}
model {
alpha ~ normal(0, 5);
beta ~ normal(0, 5);
sigma ~ normal(0, 5);
x_obs ~ normal(x, sigma);
y_obs ~ normal(alpha + beta * x, sigma);
}
and here’s the fit that the simulation script prints:
All 4 chains finished successfully.
Mean chain execution time: 3.1 seconds.
Total execution time: 3.2 seconds.
# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 alpha 1.32 1.32 0.0173 0.0168 1.29 1.34 1.00 4515. 3011.
2 beta -0.370 -0.370 0.00912 0.00911 -0.384 -0.355 1.00 4293. 2735.
3 sigma 0.498 0.498 0.0114 0.0114 0.480 0.518 1.00 1812. 2185.