Wide credible intervals for fixed effects in brms GP model

Hi,

I am fitting a logistic regression in brms to predict auxiliary selection (binary H/E) from morphosyntactic predictors, verbs, and geography.

My baseline model is:

AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction)

This converges fine.

When I add a Gaussian Process term:

AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) + gp(Latitude, Longitude)

the model converges cleanly and the GP hyperparameters make sense (short lengthscale, reasonable sdgp). However, the posterior intervals for fixed effects (Class, Cell, TAM) become extremely wide, often spanning almost the full probability range (0–1). In contrast, the random effects (Verb_Construction) have well-behaved posteriors.

My questions:

  • Why does adding a GP term inflate the uncertainty on fixed effects so much?

  • Is this expected behaviour (the GP soaking up variance that would otherwise be attributed to fixed effects)?

  • Are there recommended strategies (priors, re-parameterisation, model structure) to stabilise the fixed-effect estimates when including a GP?

A subset of the dataset is available here:
Ć tichauer, Pavel & Ripamonti, Fabio (2025). MIXPAR Database: Version 1.0 (September 2025). LINDAT/CLARIAH-CZ digital library. http://hdl.handle.net/11234/1-5982

Thanks a lot for any advice!

(Here is the MRE: library(brms)

Load dataset (publicly available)

mixpar_bin ← read.csv(“mixpar_for_R_final.csv”)

Keep only H/E auxiliaries

mixpar_bin ← subset(mixpar_bin, AUX %in% c(“H”, “E”))
mixpar_bin$AUX_bin ← ifelse(mixpar_bin$AUX == “H”, 1, 0)

Factors

mixpar_bin$Class ← factor(mixpar_bin$Class)
mixpar_bin$TAM ← factor(mixpar_bin$TAM)
mixpar_bin$Cell ← factor(mixpar_bin$Cell)
mixpar_bin$Verb_Construction ← factor(mixpar_bin$Verb_Construction)

Fit GP model

fit_gp ← brm(
AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) +
gp(Latitude, Longitude),
data = mixpar_bin,
family = bernoulli(),
chains = 4, iter = 4000, warmup = 2000, cores = 4,
control = list(adapt_delta = 0.95)
)
summary(fit_gp)

This is way outside of my wheelhouse. (I am a plant evolutionary geneticist.) But I ran your code without the GP on a Mac with OS X 26, and it didn’t look to me as if even the model without the GP is behaving very well. Is this similar to what you get?

 Family: bernoulli 
  Links: mu = logit 
Formula: AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) 
   Data: mixpar_bin (Number of observations: 2985) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Multilevel Hyperparameters:
~Verb_Construction (Number of levels: 61) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.49      0.53     1.75     3.91 1.28       11       32

Regression Coefficients:
                                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                           -475.53    288.20 -1274.54   -40.54 2.03        5       11
ClassReflexiveMantipassive           477.32    288.05    42.24  1276.05 2.03        5       11
ClassReflexiveMmedium                472.46    288.42    34.84  1271.51 2.02        5       11
ClassReflexiveMretroherent           475.27    288.18    40.73  1273.14 2.03        5       11
ClassReflexiveMtransitiveMdirect     477.09    288.12    42.51  1275.58 2.03        5       11
ClassReflexiveMtransitiveMindirect   476.84    288.26    41.68  1277.67 2.03        5       11
ClassReflexiveMunergativeMindirect   477.14    288.07    42.05  1275.95 2.03        5       11
ClassTransitive                      474.50    288.17    39.63  1273.77 2.03        5       11
ClassUnaccusative                    477.26    288.00    43.04  1275.30 2.03        5       11
ClassUnergative                      474.82    288.12    39.78  1273.49 2.03        5       11
TAMPLPF.SBJV                           0.61      0.20     0.25     1.01 1.02      100      202
TAMPRF                                -0.51      0.12    -0.75    -0.25 1.03       86      150
TAMPRF.FUT                          -338.78    278.41 -1044.66   -15.21 1.07       62       62
TAMPST.COND                           -2.89      1.17    -5.39    -0.99 1.06       63      199
Cell1SG                               -0.35      0.14    -0.61    -0.04 1.10       48       88
Cell2PL                               -0.05      0.15    -0.33     0.28 1.10       38       36
Cell2SG                               -0.40      0.15    -0.69    -0.10 1.07       66      128
Cell3PL                                1.34      0.15     1.06     1.66 1.10       34       43
Cell3SG                                0.87      0.14     0.60     1.14 1.08       59      138

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors. 

I get Rhats = 2.03 for the Class covariates, and both the Bulk and Taill ESS are pretty small.

Thank you so much
 actually, my results are a bit different and well-bahevaed, here are the two models, side by side


(attachments)


I don’t have time for a long answer, so hopefully some others will chime in. Some quick thoughts:

  • A model that assumes (conditionally) i.i.d. data when there is residual spatial correlation will generally have too narrow of credible intervals, so some degree of greater uncertainty is expected
  • If the model already has a lot of parameters relative to the data adding in a flexible GP will chew up some of that limited degrees of freedom and also lead to greater uncertainty
  • The statement “the posterior intervals for fixed effects (Class, Cell, TAM) become extremely wide, often spanning almost the full probability range (0–1).” is slightly confusing. If these are coded as binary indicators the effects should be on an unconstrained scale.
  • When you’re fitting the model with categorical inputs, did you make sure to set one level as the reference to ensure identifiability? I don’t use brms, so maybe it does this by default.
  • If all the above doesn’t help things, consider looking into the literature on “spatial confounding”. E.g., tandfonline.com/doi/abs/10.1198/tast.2010.10052 and associated works.

Hmm. I don’t know why we’re getting such different results. I clearly did something wrong. As @js592 suggests, I suspect that one or more of your fixed effects is confounded with geographical location.

Many thanks for pointing me to the Hodges & Reich paper — it was exactly the tweak I needed. I implemented the restricted spatial regression (RSR) approach by adding residualized spatial basis terms, and it worked very well. The results are much more stable than with the Gaussian process:

  • The fixed effects (Class, Cell, TAM) now have credible intervals comparable to the baseline model, instead of the inflated uncertainty we saw with the GP.

  • The spatial terms (e.g., spat3, spat4) came out significant, capturing residual spatial structure without “soaking up” the signal of the fixed effects.

  • Overall fit is solid, and the model seems to balance geographic variation and linguistic predictors in a way that’s both interpretable and statistically robust.

To give you a snapshot:

TAMPRF = -0.65 (95% CI: -0.90, -0.41)

  • TAMPRF.FUT = -2.50 (95% CI: -4.36, -0.89)

  • Cell3PL = 1.57 (95% CI: 1.29, 1.86)

  • Cell3SG = 1.11 (95% CI: 0.84, 1.39)

So the RSR approach really seems the way to go – it solves the confounding problem while keeping the main linguistic effects interpretable.

Thanks again for the tip, it was extremely helpful!