Hi @Guillermo_Uceda, sounds like a cool application! This is a question that touches on a bunch of different parts of brms
, which is good news and bad–good because it sounds like you are taking full advantage of the capabilities of brms
to fit this model, and bad because the answer is going to be a little bit complicated and hard to digest.
Your approach certainly seems reasonable, but to really figure out whether this model is appropriate it would be good to look at some posterior predictive checks. Ultimately, the data are your best guide (albeit an imperfect one) to whether your model is adequate.
Distributional regression
Many response distributions in generalized linear models have two or more parameters. For example, normal distributions have a mean and a standard deviation. Commonly, we regress the mean on covariates and assume the other parameters (like the standard deviation) are homogeneous across all values. In brms
it is pretty easy to relax this assumption, as described in this vignette: Estimating Distributional Models with brms. In particular, I wonder whether it might make sense for you to regress the hurdle probability on covariates. I think that this distributional parameter will be called hu
, and you can include it in just the same way that zi
gets treated in the zero-inflated example in the vignette linked above.
Priors on coefficients
It seems like you are tempted to place priors on coefficients based on the distribution of the covariate. This isn’t right though. The prior is not about the probable values of the covariate; it is a prior on the associated effect size (i.e. the value of the associated coefficient). So for example, the number of flowers can’t be negative, but do you also intend to be absolutely certain a priori that the visitation rate cannot possibly go down as the number of flowers increases? And the effect size shouldn’t be centered around the mean–it should be centered either around your a priori belief about how much an increase in flowers should affect visitation (if you are using informative priors) or around zero (if you are using weakly informative priors for regularization only).
Priors on the intercept
One thing to watch out for in brms
is that by default the prior that you define on the intercept will be the prior that you define on the intercept when all covariates are held at their means, not when all covariates are held at zero. However, for certain special classes of effects, things can be a bit more complicated, and I’m not 100% sure that the default behavior is guaranteed to be the prior when all covariates are held at their means in the presence of a spline in the model. To be safe, I would suggest using the 0 + Intercept
syntax described here to enable you to put priors on the true intercept (i.e. with covariates held at zero): https://cran.r-project.org/web/packages/brms/brms.pdf (see page 44). If you want to define an intercept with covariates held at their means, you can always manually center the covariates before passing them to the model, and still use the 0 + Intercept
syntax. Also, I think this issue might have something to do with the error you see (or could yield another error once you fix your error) because in the prior specification class = "b", coef = "Intercept")
should only work when using the special Intercept
keyword in the model formula.
Priors on splines
This part is conceptually hard, and there’s no way around it. The best resource on understanding how to set the priors is this thread, and particularly the post from Gavin Simpson Better priors (non-flat) for GAMs (brms). However, to understand that post, you’ll need some notion of splines as random effects, and for that a good resource is this wonderful post by @tjmahr Random effects and penalized splines are the same thing - Higher Order Functions, but even with this post as a guide I find Bayesian splines to be quite challenging conceptually.
Something that is definitely true is that set_prior("uniform(2270,3530)", class = "b", coef = "s(elevation)")
is not achieving what you want. It’s not the right syntax for setting a prior on a smooth, and there is no reason that the range of elevations observed in the data should correspond at all to your prior on the effect size for elevation.