Wide credible intervals for fixed effects in brms GP model

Pavel_Stichauer · September 22, 2025, 1:32pm

Hi,

I am fitting a logistic regression in brms to predict auxiliary selection (binary H/E) from morphosyntactic predictors, verbs, and geography.

My baseline model is:

AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction)

This converges fine.

When I add a Gaussian Process term:

AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) + gp(Latitude, Longitude)

the model converges cleanly and the GP hyperparameters make sense (short lengthscale, reasonable sdgp). However, the posterior intervals for fixed effects (Class, Cell, TAM) become extremely wide, often spanning almost the full probability range (0–1). In contrast, the random effects (Verb_Construction) have well-behaved posteriors.

My questions:

Why does adding a GP term inflate the uncertainty on fixed effects so much?
Is this expected behaviour (the GP soaking up variance that would otherwise be attributed to fixed effects)?
Are there recommended strategies (priors, re-parameterisation, model structure) to stabilise the fixed-effect estimates when including a GP?

A subset of the dataset is available here:
Štichauer, Pavel & Ripamonti, Fabio (2025). MIXPAR Database: Version 1.0 (September 2025). LINDAT/CLARIAH-CZ digital library. http://hdl.handle.net/11234/1-5982

Thanks a lot for any advice!

(Here is the MRE: library(brms)

Load dataset (publicly available)

mixpar_bin ← read.csv(“mixpar_for_R_final.csv”)

Keep only H/E auxiliaries

mixpar_bin ← subset(mixpar_bin, AUX %in% c(“H”, “E”))
mixpar_bin$AUX_bin ← ifelse(mixpar_bin$AUX == “H”, 1, 0)

Factors

mixpar_bin$Class ← factor(mixpar_bin$Class)
mixpar_bin$TAM ← factor(mixpar_bin$TAM)
mixpar_bin$Cell ← factor(mixpar_bin$Cell)
mixpar_bin$Verb_Construction ← factor(mixpar_bin$Verb_Construction)

Fit GP model

fit_gp ← brm(
AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) +
gp(Latitude, Longitude),
data = mixpar_bin,
family = bernoulli(),
chains = 4, iter = 4000, warmup = 2000, cores = 4,
control = list(adapt_delta = 0.95)
)
summary(fit_gp)

kholsinger · September 22, 2025, 6:16pm

This is way outside of my wheelhouse. (I am a plant evolutionary geneticist.) But I ran your code without the GP on a Mac with OS X 26, and it didn’t look to me as if even the model without the GP is behaving very well. Is this similar to what you get?

 Family: bernoulli 
  Links: mu = logit 
Formula: AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) 
   Data: mixpar_bin (Number of observations: 2985) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Multilevel Hyperparameters:
~Verb_Construction (Number of levels: 61) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.49      0.53     1.75     3.91 1.28       11       32

Regression Coefficients:
                                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                           -475.53    288.20 -1274.54   -40.54 2.03        5       11
ClassReflexiveMantipassive           477.32    288.05    42.24  1276.05 2.03        5       11
ClassReflexiveMmedium                472.46    288.42    34.84  1271.51 2.02        5       11
ClassReflexiveMretroherent           475.27    288.18    40.73  1273.14 2.03        5       11
ClassReflexiveMtransitiveMdirect     477.09    288.12    42.51  1275.58 2.03        5       11
ClassReflexiveMtransitiveMindirect   476.84    288.26    41.68  1277.67 2.03        5       11
ClassReflexiveMunergativeMindirect   477.14    288.07    42.05  1275.95 2.03        5       11
ClassTransitive                      474.50    288.17    39.63  1273.77 2.03        5       11
ClassUnaccusative                    477.26    288.00    43.04  1275.30 2.03        5       11
ClassUnergative                      474.82    288.12    39.78  1273.49 2.03        5       11
TAMPLPF.SBJV                           0.61      0.20     0.25     1.01 1.02      100      202
TAMPRF                                -0.51      0.12    -0.75    -0.25 1.03       86      150
TAMPRF.FUT                          -338.78    278.41 -1044.66   -15.21 1.07       62       62
TAMPST.COND                           -2.89      1.17    -5.39    -0.99 1.06       63      199
Cell1SG                               -0.35      0.14    -0.61    -0.04 1.10       48       88
Cell2PL                               -0.05      0.15    -0.33     0.28 1.10       38       36
Cell2SG                               -0.40      0.15    -0.69    -0.10 1.07       66      128
Cell3PL                                1.34      0.15     1.06     1.66 1.10       34       43
Cell3SG                                0.87      0.14     0.60     1.14 1.08       59      138

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors.

I get Rhats = 2.03 for the Class covariates, and both the Bulk and Taill ESS are pretty small.

Pavel_Stichauer · September 23, 2025, 4:03am

Thank you so much… actually, my results are a bit different and well-bahevaed, here are the two models, side by side…

(attachments)

js592 · September 23, 2025, 5:26pm

I don’t have time for a long answer, so hopefully some others will chime in. Some quick thoughts:

A model that assumes (conditionally) i.i.d. data when there is residual spatial correlation will generally have too narrow of credible intervals, so some degree of greater uncertainty is expected
If the model already has a lot of parameters relative to the data adding in a flexible GP will chew up some of that limited degrees of freedom and also lead to greater uncertainty
The statement “the posterior intervals for fixed effects (Class, Cell, TAM) become extremely wide, often spanning almost the full probability range (0–1).” is slightly confusing. If these are coded as binary indicators the effects should be on an unconstrained scale.
When you’re fitting the model with categorical inputs, did you make sure to set one level as the reference to ensure identifiability? I don’t use brms, so maybe it does this by default.
If all the above doesn’t help things, consider looking into the literature on “spatial confounding”. E.g., tandfonline.com/doi/abs/10.1198/tast.2010.10052 and associated works.

kholsinger · September 23, 2025, 5:58pm

Hmm. I don’t know why we’re getting such different results. I clearly did something wrong. As @js592 suggests, I suspect that one or more of your fixed effects is confounded with geographical location.

Pavel_Stichauer · September 25, 2025, 11:39am

js592:

don’t have time for a long answer, so hopefully some others will chime in. Some quick thoughts:

A model that assumes (conditionally) i.i.d. data when there is residual spatial correlation will generally have too narrow of credible intervals, so some degree of greater uncertainty is expected

If the model already has a lot of parameters relative to the data adding in a flexible GP will chew up some of that limited degrees of freedom and also lead to greater uncertainty

The statement “the posterior intervals for fixed effects (Class, Cell, TAM) become extremely wide, often spanning almost the full probability range (0–1).” is slightly confusing. If these are coded as binary indicators the effects should be on an unconstrained scale.

When you’re fitting the model with categorical inputs, did you make sure to set one level as the reference to ensure identifiability? I don’t use brms, so maybe it does this by default.

If all the above doesn’t help things, consider looking into the literature on “spatial confounding”. E.g., tandfonline.com/doi/abs/10.1198/tast.2010.10052 and associated works.

Many thanks for pointing me to the Hodges & Reich paper — it was exactly the tweak I needed. I implemented the restricted spatial regression (RSR) approach by adding residualized spatial basis terms, and it worked very well. The results are much more stable than with the Gaussian process:

The fixed effects (Class, Cell, TAM) now have credible intervals comparable to the baseline model, instead of the inflated uncertainty we saw with the GP.
The spatial terms (e.g., spat3, spat4) came out significant, capturing residual spatial structure without “soaking up” the signal of the fixed effects.
Overall fit is solid, and the model seems to balance geographic variation and linguistic predictors in a way that’s both interpretable and statistically robust.

To give you a snapshot:

TAMPRF = -0.65 (95% CI: -0.90, -0.41)

TAMPRF.FUT = -2.50 (95% CI: -4.36, -0.89)
Cell3PL = 1.57 (95% CI: 1.29, 1.86)
Cell3SG = 1.11 (95% CI: 0.84, 1.39)

So the RSR approach really seems the way to go – it solves the confounding problem while keeping the main linguistic effects interpretable.

Thanks again for the tip, it was extremely helpful!

Topic		Replies	Views
Computing approximate Gaussian processes Modeling brms	6	209	June 24, 2026
Brms: one shared gp for temporally correlated errors in each of 250+ timeseries brms gaussian-process	18	2359	August 3, 2018
GP interpolation with brms brms fitting-issues , specification , brms	3	554	July 28, 2022
Evaluating a brms model with a Gaussian Process brms techniques , ecology , gaussian-process , visualisation	5	1634	June 14, 2023
Fitting a GP model with subject-specific GP terms in brms brms gaussian-process	4	1661	May 21, 2021

Wide credible intervals for fixed effects in brms GP model

Load dataset (publicly available)

Keep only H/E auxiliaries

Factors

Fit GP model

Related topics