Given the adage of starting small, I have a very simple model which I’m failing to fit correctly.
I think I’m clearly misunderstanding something fundamental, and I’m not really sure what I should be searching for to better understand it.

// step_test
data {
int<lower=0> N;
real y[N];
real predicted[N];
}
parameters {
real<lower=0> sigma;
real delta[N];
}
model {
sigma ~ normal(0, 1);
delta ~ normal(0, 10);
for(n in 1:N){
y[n] ~ normal(predicted[n] + delta[n], sigma);
}
}

Why is sigma not basically zero?

Why is there so much variation in the posterior for delta?

Is it that there is not enough freedom for the sampler to explore the parameter space given that the only probable value for sigma is 0? I’ve tried setting init=0.

Is there any way of fitting something like this?

I later want to add parameters which will generate the prediction and some uncertainty to y, but I am still interested in the difference between y and prediction.

Thanks

I asked something similar a while back and marked it solved, this was a mistake.

R code to test:

N = 10
y = rep(1:N)
predicted = y + rnorm(N, 0, 10)

This stan program is not syntactically correct. You declare a parameter star which is never used while delta is probably a parameter that is never declared.

Well typically I get divergent transitions even with adapt_delta = 0.95.
I also get low bayesian fraction of missing information warnings.

Perhaps more importantly, as this is the first step to building a more complex (and useful) model this version is not correctly estimating the known values for delta particularly well.

Given that delta is free to be whatever is needed to satisfy y - (predicted + delta) = 0.
I don’t understand why there there is so much variation in sigma (mean ~ 0.9).

Is it just a case of making the prior on sigma tighter?

Thanks for your reply, but are you sure sigma should be 10?

Delta is a vector and should be the difference between each y[n] and predicted[n] no?

y + rnorm(N, 0, 10)

is functionally the same as ten random numbers, it was just an quick way of constructing it. I don’t think the standard deviation used here will make any difference, and If it does my model is wrong for what I want to do.

Yes. You’re right, my bad. I think the problem with your approach is that you have 10 + 1 free parameters with only 10 pieces of data. Without strong priors the model is exactly identified which leads to all kind of troubles for stan. I think of it as a posterior with zero weight on all but one value: normal - predicted for the delta parameter, 0 for sigma. There is no real estimation or exploration to be done.

You could do something similar with a multilevel model

parameters {
real<lower=0> sigma;
real<lower=0> sigma_delta;
real delta[N];
}
model {
sigma ~ normal(0, 1);
sigma_delta ~ normal(0, 10);
delta ~ normal(0, sigma_delta);
y ~ normal(predicted + delta, sigma)
}

Yes. I think so. Ironically, if the problem is too simple (delta = y - predicted), Stan has problems estimating the parameters. Stan/Bayesian inference is all about estimating uncertainty. Too simple, overidentified models do not have uncertainty. Adding some noise (~ measurement error?) would probably help.