How to understand the difference between results with flat and informative priors?

Hi there,

I am new to bayesian modeling, and trying to utilize this new tech for my own research. I did a model with flat prior and data, fit1, and another model with weak prior only (no data) for prior predictive check, fit1_priorpc. I expected that the estimated coefficients in fit1_priorpc to be consistent with the priors I set for this model.

However, as you can see below, the estimated intercept for this model is negative even though I set it to be ‘student_t(3,3,1)’. I noticed that the Est.Error for it was quite large (18.61), I guess it meant this estimate was not reliable, right? Similarly, I expected the sigma to be within 'student_t(3,0,1) ', whereas the estimated sigma was larger than 1, which is 1.12.

Moreover, the regression coefficients of the fit1_priorpc were not significant, but two of them were significant in fit1. I understand that the domain knowledges are important in specifying priors, but how come the results differed so much between fit1 and fit_priorpc even though I only set weak priors? Please give me some suggestions on how to understand this. Thank you very much.

> fit1=brm(f1,data=dat603,iter=10000,cores=12,save_pars = save_pars(all = TRUE))
> fit1
 Family: gaussian 
  Links: mu = identity 
Formula: M ~ o + j + r 
   Data: dat603 (Number of observations: 603) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Regression Coefficients:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            1.54      0.02     1.51     1.58 1.00    21742    15268
o                    0.35      0.02     0.32     0.37 1.00    11777    12028
j                   -0.02      0.01    -0.04     0.01 1.00    13498    12657
r                    0.12      0.02     0.09     0.16 1.00    10547    12412

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.13      0.00     0.13     0.14 1.00    17037    13279

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

> p=get_prior(f1,dat603)
> p[p$class=='Intercept',]$prior='student_t(3,3,1)'
> p[p$class=='b',]$prior='student_t(3,0.3,2)'
> p[p$class=='sigma',]$prior='student_t(3,0,1)'
> p
              prior     class             coef group resp dpar nlpar lb ub tag  source
 student_t(3,0.3,2)         b                                                                  default
 student_t(3,0.3,2)         b          j                                                       default
 student_t(3,0.3,2)         b          r                                                        default
 student_t(3,0.3,2)         b          o                                                      default
   student_t(3,3,1) Intercept                                                               default
   student_t(3,0,1)     sigma                                         0                      default

> fit1_priorpc=brm(f1,data=dat603,prior=p,sample_prior = 'only',
                 iter=10000,cores=12,save_pars = save_pars(all = TRUE))

> fit1_priorpc
Family: gaussian 
  Links: mu = identity 
Formula: M  ~ o  + j + r
   Data: dat603 (Number of observations: 603) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Regression Coefficients:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           -0.05     18.61   -36.06    35.96 1.00    14572     9165
o                     0.33      3.35    -6.04     6.91 1.00    16935     8058
j                     0.29      3.39    -6.05     6.61 1.00    18127     8980
r                    0.33      3.17    -5.93     6.77 1.00    18507     9066

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.12      1.45     0.03     4.32 1.00    14835     7492

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I’m going to answer your questions in reverse, because the answer to your second question is much more fundamental, and the answer to your first is more technical.

This difference occurs because in fit1 the estimates are informed by the data, and in fit1_priorpc the estimates are informed only by the priors. The whole point of Bayesian inference is to use data to update our beliefs. This updating is the difference you’re seeing.

This is because brms by default fits the model internally after centering all predictors. The prior that you supply for the intercept is the prior for the value of the linear predictor when all predictors are set to their means, not when all predictors are set to zero. But the intercept that is reported in the output is the intercept when all predictors are set to zero. Thus, they differ. If you want to turn this off and supply a prior directly on the intercept when all predictors are zero, use ~ 0 + Intercept + … in your model formula.

5 Likes

The centering issue is very important for brms. To learn more, read through the Parameterization of the population-level intercept and set_prior sections of the brms reference manual.

4 Likes

Alternatively, wrap the formula in bf(…, center = F).

3 Likes

Thanks for pointing that out. I didn’t realize how much the centering choice affects brms models. I’ll go through the sections you mentioned in the reference manual, especially the parts on intercept parameterization and setting priors. Appreciate the guidance — this should help me understand what’s going on in my model.

1 Like