I have a logistic regression setting, where I’m interested in predictive power most of all . I’m unsure about what functional form of one of my covariates will guarantee maximum predictive power out-of-sample. I know that the covariate at hand matters a-priori, so I don’t want to use a horseshoe because I run the risk of selecting the variable out entirely.

Inspired by the BYM2 identification strategy (as seen for example in Riebler, A., Sørbye, S. H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling ) I thought an appropriate model might be a mixture of the covariate effects under different functional forms:

y_i \sim \mbox{Bernoulli} (\pi_i )

\mbox{logit}(\pi_i) = \alpha_i + \mbox{mix}_i

\mbox{mix}_i = \sqrt{\rho} \beta_1 x_i + \sqrt{1-\rho} * \beta_2 f(x_i)

\rho \sim \mbox{Beta}(0.5,0.5)

\beta_{1,2} \sim N(0,1)

where f is any function that is not linear. I might also want to extend this to a Dirichlet prior if there are more than 1 candidate f s.

Is this approach valid ? Do you have any other suggestion as to how to go about a similar exercise ?

Thanks in advance !