I was wondering whether it is feasible and mathematically meaningful to run (multilevel) bayesian models with “pre-fixed” estimates for some smooth terms. So, in pseudo-code, something like this:
## simple model
estimate = bf(y ~ s(x,z,k=4)
## complex model
brm(
bf(x ~ z) +
bf(y ~ x + z + estimate[s(x,z,k=4)])
)
The point is that I trust the smooth-estimates of the simple model more than the outcomes of the complex model, because I’ve verified them with a mgcv::gam model that shows that the data fits very well and explain a lot of the variation, whereas the smooth-estimates from the complex model don’t make tremendous sense.
So I am wondering whether one could fix that term and estimate the contribution of the remaining variables - does that make sense at all?
The smooth models from mgcv are the same as in brms (presuming the same specification), so I suppose it might be possible to use the point estimates from the mgcv model as constant priors. Of course it should be relatively simple to make that simple GAM in brms anyway to verify that you’re getting equivalent results.
That being said, I think that you’d only want to do this as a strategy to better understand the model. With the additive component being estimated from the same data, but then being treated as certain, the estimates of the other features will be too precise and maybe biased, so probably are not trustworthy. Maybe another approach would be to start with the simplest model including your additive component that produces sensible results, and build upwards gradually from there.
Edit: I misread the post, I see that you already made a version of your simple model in brms. In that case perhaps the posterior means could be used as constant priors.
Yupp, I did both: mgcg gam and brms model, and they give the same results. So how would I create those constant priors, would this some sort of a matrix?
Sorry, but I don’t fully understand why the estimates would become less trustworthy - I would have guessed that if anything, they would become more conservative because a lot of the variation is already accounted for/fixed on the space given by the priors?
You should be able to specify all of the priors for the smooth terms as for any other priors. If you have a look at the prior_summary() from the completed brms object you will be able to see the priors that can be specified, and I think you can use constant() as its prior to set a fixed value. You can see the design matrix generated in preprocessing if you look at the stancode().
Presumably the additive component and these additional features you want to add are all being estimated from the same data, right? So in that case, if you generate a simple model including the smooth terms, then fix those smooth terms to make a more complex model but from the same data, the estimates from that more complex model are conditional on the now-fixed smooth terms (this is analogous to step-wise feature selection). In effect you’ve used the data to update the prior, and selected an extremely strong prior. Your second analysis doesn’t know that information from the data has already been used.
I think ideally you would want all the features to be estimated at the same time so that the estimates are controlled for each other, including the smooth terms. So this might be a useful approach to help you get the specification for the smooths right, but not sure it is useful for estimation.
Many thanks for your advice - in the end I did what you suggested and slowly built the model toward a structure that makes sense and that I can justify. I think the lesson here is that changing estimates are simply indicative of more complex statistical interactions that unfold or become visible when I make the model more complex. I guess this doesn’t mean that the simple model becomes invalid, but that I have to discuss the different surface smooths I get: one for a more simple and one for complex representation of my system.
In that regard it would help to be able to discuss the “goodness of fit” or “variance explained” for these models, to be able to say which of the models performs better statistically. I guess to that end I could use the sigma of my dependent variables?
Finally, I am wondering whether I absolutely have to scale my variables: I have always used brms with scaled variables (mean=0, sd=0.5). However, when using unscaled variables that are simply transformed to be in the same range (log/sqrt) I get a smaller sigma - is this a wrong thing to do?
The goodness-of-fit tools I’m personally using for models like this are pp_check(), bayes_r2(), and loo(). Most of my interest is in graphical representations, in general as I think I learn most from seeing the predictions alongside the data. That might be because I mostly make fairly simple models…
I don’t do much scaling at all, but perhaps that is because I don’t work with a lot of variables with really different scale. There’s no requirement to do it but it may improve interpretability and efficiency for some models. In general I’d say that transformations should always be selected for a substantive reason, and the meaning of the parameters will need to take transformations into account.