Fitting y ~ N(x + z, sigma)?

Hi

Given the adage of starting small, I have a very simple model which I’m failing to fit correctly.
I think I’m clearly misunderstanding something fundamental, and I’m not really sure what I should be searching for to better understand it.

// step_test
data {
  int<lower=0> N;
  real y[N];
  real predicted[N];
}

parameters {
  real<lower=0> sigma;
  real delta[N];
}

model {
  sigma ~ normal(0, 1);
  delta ~ normal(0, 10);

  for(n in 1:N){
    y[n] ~ normal(predicted[n] + delta[n], sigma);
  }
}

Why is sigma not basically zero?

Why is there so much variation in the posterior for delta?

Is it that there is not enough freedom for the sampler to explore the parameter space given that the only probable value for sigma is 0? I’ve tried setting init=0.

Is there any way of fitting something like this?

I later want to add parameters which will generate the prediction and some uncertainty to y, but I am still interested in the difference between y and prediction.

Thanks

I asked something similar a while back and marked it solved, this was a mistake.

R code to test:

N = 10
y = rep(1:N)
predicted = y + rnorm(N, 0, 10)

This stan program is not syntactically correct. You declare a parameter star which is never used while delta is probably a parameter that is never declared.

Sorry, yes, I renamed star to delta when pasting the example, but that is not the issue.

There doesn’t seem to be anything wrong with this model. What do you think should happen differently and why?

Well typically I get divergent transitions even with adapt_delta = 0.95.
I also get low bayesian fraction of missing information warnings.

Perhaps more importantly, as this is the first step to building a more complex (and useful) model this version is not correctly estimating the known values for delta particularly well.

Given that delta is free to be whatever is needed to satisfy y - (predicted + delta) = 0.
I don’t understand why there there is so much variation in sigma (mean ~ 0.9).

Is it just a case of making the prior on sigma tighter?

N = 10
y = rep(1:N)
predicted = y + rnorm(N, 0, 10)

So I would expect delta=0 and sigma=10. But with N=10 there is a lot of sampling variation so it is probably better to look at the 50% intervals.

You also have a very narrow normal(0, 1) prior on sigma. With N=10 this prior is going to pull sigma closer to 0 than the likelihood would.

Hi stijn

Thanks for your reply, but are you sure sigma should be 10?

Delta is a vector and should be the difference between each y[n] and predicted[n] no?

y + rnorm(N, 0, 10)

is functionally the same as ten random numbers, it was just an quick way of constructing it. I don’t think the standard deviation used here will make any difference, and If it does my model is wrong for what I want to do.

Yes. You’re right, my bad. I think the problem with your approach is that you have 10 + 1 free parameters with only 10 pieces of data. Without strong priors the model is exactly identified which leads to all kind of troubles for stan. I think of it as a posterior with zero weight on all but one value: normal - predicted for the delta parameter, 0 for sigma. There is no real estimation or exploration to be done.

You could do something similar with a multilevel model

parameters {
  real<lower=0> sigma;
  real<lower=0> sigma_delta;
  real delta[N];
}

model {
  sigma ~ normal(0, 1);
  sigma_delta ~ normal(0, 10);
  delta ~ normal(0, sigma_delta);
  y ~ normal(predicted + delta, sigma)
}

Thanks, that makes sense.

So is it typical in testing the early stages of a model to make sure your test case is not exactly identified?

Is there a standard way of doing this, or do you just add some fuzz to your example data?

Yes. I think so. Ironically, if the problem is too simple (delta = y - predicted), Stan has problems estimating the parameters. Stan/Bayesian inference is all about estimating uncertainty. Too simple, overidentified models do not have uncertainty. Adding some noise (~ measurement error?) would probably help.

Yes, generate parameters from the priors then data from the parameters. That should then be calibrated by definition.

You’re going to want a non-centered parameterization of delta, too, I would suspect.