Calculate intercept by centralising residuals instead of estimating intercept?

Arne_Henningsen · April 6, 2024, 8:10am

I want to fit a model, where I do not care about the intercept and I have no idea about its potential value so that it is difficult to specify a reasonable prior for the intercept. I wonder whether it is best (a) to use a rather uninformative prior for the intercept, e.g., a normal distribution with mean zero and a very large variance, (b) to do not specify a prior for the intercept at all, or (c) in each iteration calculate the intercept so that the residuals have an average value of zero.

In case (c), the stan code could look like this:

data {
  int<lower=0> N;
  vector[N] x1;
  vector[N] x2;
  vector[N] y;
}

parameters {
  real b1;
  real b2;
  real<lower=0> sigma;
}

transformed parameters {
  real b0;
  vector[N] yFit;
  vector[N] res;

  yFit = b1 * x1 + b2 * x2;
  b0 = mean( y - yFit );
  res = y  - ( b0 + yFit );
}

model {
  // priors
  b1 ~ normal( 0, 1 );
  b2 ~ normal( 0, 1 );
  sigma ~ cauchy( 0, 1 );

  res ~ normal( 0, sigma);
}

Note: this is a very much simplified version of my actual model, which has many more parameters and is very non-linear.

martinmodrak · April 10, 2024, 12:21pm

I am not completely sure what the approach c) entails, but I would be very careful around such constructs. Notably, the approach is non-generative (there is no way to build a simulator generating data that correspond to the model). In any case, I think c) would imply that there is no uncertainty about the intercept, which would probably result in somewhat smaller posterior uncertainty about the other parameters, which seems suspicious. Even if you in the end don’t care about the intercept, you don’t want your model to ignore possible variability in the average values and thus be overconfident.

Unless your dataset provides very little information about the model, the difference between a) and b) should be very small. For a mostly non-informative prior something like student_t(3, 0, <large-ish number>) is somewhat preferable to a wide normal due to the fatter tails.

I’ll also note that it usually isn’t that hard to get a weakly informative prior just from very basic information about the data. E.g. if your outcome is voltage, then unless you are doing some crazy astrophysics you can safely assume that the average voltage is smaller than a lighning bolt, which is easy to encode in a prior.

Best of luck with your model

Arne_Henningsen · April 11, 2024, 11:02am

Thanks a lot, @martinmodrak, for your feedback, detailed explanations, and suggestions! As the intercept does not have a real-world meaning or unit of measurement, it seems to be impossible to find a weakly informative prior. Hence, we’ll follow your suggestion and use a student t distribution as prior.

Topic		Replies	Views
Default prior for the intercept in multilevel models is based on the data? brms	2	878	December 2, 2019
How to understand how to set prior for Intercepts (e.g. model form y ~ 1 + x)? Modeling prior-choice , brms	1	601	November 2, 2021
Rstanarm stan_lmer making implicit prior_intercept explicit no longer centers it? rstanarm	6	634	December 9, 2020
Mixed up with the understanding of y~0+ intercept+x and y~x Modeling techniques , specification	1	1500	June 18, 2020
Novice with no prior experience Modeling specification , priors , brms	9	802	April 1, 2022

Calculate intercept by centralising residuals instead of estimating intercept?

Related topics