Calculate intercept by centralising residuals instead of estimating intercept?

I want to fit a model, where I do not care about the intercept and I have no idea about its potential value so that it is difficult to specify a reasonable prior for the intercept. I wonder whether it is best (a) to use a rather uninformative prior for the intercept, e.g., a normal distribution with mean zero and a very large variance, (b) to do not specify a prior for the intercept at all, or (c) in each iteration calculate the intercept so that the residuals have an average value of zero.

In case (c), the stan code could look like this:

data {
  int<lower=0> N;
  vector[N] x1;
  vector[N] x2;
  vector[N] y;
}

parameters {
  real b1;
  real b2;
  real<lower=0> sigma;
}

transformed parameters {
  real b0;
  vector[N] yFit;
  vector[N] res;

  yFit = b1 * x1 + b2 * x2;
  b0 = mean( y - yFit );
  res = y  - ( b0 + yFit );
}

model {
  // priors
  b1 ~ normal( 0, 1 );
  b2 ~ normal( 0, 1 );
  sigma ~ cauchy( 0, 1 );

  res ~ normal( 0, sigma);
}

Note: this is a very much simplified version of my actual model, which has many more parameters and is very non-linear.

1 Like

I am not completely sure what the approach c) entails, but I would be very careful around such constructs. Notably, the approach is non-generative (there is no way to build a simulator generating data that correspond to the model). In any case, I think c) would imply that there is no uncertainty about the intercept, which would probably result in somewhat smaller posterior uncertainty about the other parameters, which seems suspicious. Even if you in the end don’t care about the intercept, you don’t want your model to ignore possible variability in the average values and thus be overconfident.

Unless your dataset provides very little information about the model, the difference between a) and b) should be very small. For a mostly non-informative prior something like student_t(3, 0, <large-ish number>) is somewhat preferable to a wide normal due to the fatter tails.

I’ll also note that it usually isn’t that hard to get a weakly informative prior just from very basic information about the data. E.g. if your outcome is voltage, then unless you are doing some crazy astrophysics you can safely assume that the average voltage is smaller than a lighning bolt, which is easy to encode in a prior.

Best of luck with your model

1 Like

Thanks a lot, @martinmodrak, for your feedback, detailed explanations, and suggestions! As the intercept does not have a real-world meaning or unit of measurement, it seems to be impossible to find a weakly informative prior. Hence, we’ll follow your suggestion and use a student t distribution as prior.

1 Like