Hi,
I am fitting a logistic regression in brms to predict auxiliary selection (binary H/E) from morphosyntactic predictors, verbs, and geography.
My baseline model is:
AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction)
This converges fine.
When I add a Gaussian Process term:
AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) + gp(Latitude, Longitude)
the model converges cleanly and the GP hyperparameters make sense (short lengthscale, reasonable sdgp). However, the posterior intervals for fixed effects (Class, Cell, TAM) become extremely wide, often spanning almost the full probability range (0â1). In contrast, the random effects (Verb_Construction) have well-behaved posteriors.
My questions:
-
Why does adding a GP term inflate the uncertainty on fixed effects so much?
-
Is this expected behaviour (the GP soaking up variance that would otherwise be attributed to fixed effects)?
-
Are there recommended strategies (priors, re-parameterisation, model structure) to stabilise the fixed-effect estimates when including a GP?
A subset of the dataset is available here:
Ć tichauer, Pavel & Ripamonti, Fabio (2025). MIXPAR Database: Version 1.0 (September 2025). LINDAT/CLARIAH-CZ digital library. http://hdl.handle.net/11234/1-5982
Thanks a lot for any advice!
(Here is the MRE: library(brms)
Load dataset (publicly available)
mixpar_bin â read.csv(âmixpar_for_R_final.csvâ)
Keep only H/E auxiliaries
mixpar_bin â subset(mixpar_bin, AUX %in% c(âHâ, âEâ))
mixpar_bin$AUX_bin â ifelse(mixpar_bin$AUX == âHâ, 1, 0)
Factors
mixpar_bin$Class â factor(mixpar_bin$Class)
mixpar_bin$TAM â factor(mixpar_bin$TAM)
mixpar_bin$Cell â factor(mixpar_bin$Cell)
mixpar_bin$Verb_Construction â factor(mixpar_bin$Verb_Construction)
Fit GP model
fit_gp â brm(
AUX_bin ~ Class + TAM + Cell + (1 | Verb_Construction) +
gp(Latitude, Longitude),
data = mixpar_bin,
family = bernoulli(),
chains = 4, iter = 4000, warmup = 2000, cores = 4,
control = list(adapt_delta = 0.95)
)
summary(fit_gp)

