Variable selection with cmdstanr and projpred

fabio · March 11, 2024, 2:03pm

Hi dear Stan community!
I would like to perform Projection Predictive Feature Selection - more specifically the rankig of the predictors - and using the cmdstanr interface to handle R code, Stan code, compliations…
While it is possible to do directly with brms and rstanarm I do not know how to couple projpred with cmdstanr. Any help?
I tag here @jonah, @fweber144 and @avehtari because of their activity on the relevant packages for my work.

avehtari · March 11, 2024, 3:50pm

Since you are not happy with brms, I guess you have a model that is not easy to make with brms, which may mean that it’s not easy for projection predictive approach. Can you tell more about your model, so I can tell whether the method itself is suitable, before we discuss implementation details?

fabio · March 11, 2024, 4:25pm

Dear @avehtari, I am not confident with the grammar for the formula of brms and maybe my model could be easily traslated. However I feel confident in writing directly the stan code from scratch to add more and more complexity form the first prototypal model up to the final implementation with multiple hierarchical levels.

So, long story short:
my response variable y (daily average temperature over decades) is normally distributed and show strong periodicity (seasonality) along years. To catch this behavior I decided to model the latent mean \mu as a Fourier series of n terms:

\mu = A_0 + \sum_{n=1}^{N}(A_n \cos(\frac{2\pi n }{P}x)+B_n \sin(\frac{2\pi n }{P}x))

With n \rightarrow \infty this is a gaussian process with a periodic kernel, but GP here are too costly in terms of computation.
P in my case is fixed and the periodic terms can be precomputed and the problem is similar to a multiple linear regression with 2N+1 terms. First prototypes show that with just N=2 i have good results in approximating my data with no convergence issues, good EBFMI and R-hat very close to one.
But I would like not to be arbitrary in choosing N, so I would like to check N between overfitting (N too high) and a simplistic model (N too low). And so projpred seems to me the best way to go… What do you think?

avehtari · March 11, 2024, 5:25pm

If the goal is to choose just N, then projpred is likely to be more complicated than needed, and you could use, e.g. LOO-CV.

If you are using a prior on A_n and B_n which is derived from the limiting GP, then you can’t overfit when N increases, you just get closer to that GP. See e.g. Appendix B in Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming | Statistics and Computing for the equations for periodic GP, and Stan implementation in Birthdays case study. The paper I mentioned, discuss also how to diagnose whether there are enough basis functions or too many.

fabio · March 11, 2024, 6:44pm

Thanks @avehtari . I will read the Practical Hilbert space again. I posted some code here some months ago about HIlbert Space approximation but this approach is definitely an inexhaustible mine of information!
For this problem so mine I will proceed with the stacking of the predictive distributions increasing n ad see how it’s going. Then I will use the HSGP with the periodic kernel! Thanks a lot!

Topic		Replies	Views
Using projpred on stanfit object RStan rstan , projpred	2	664	November 12, 2021
Cmdstanr + projpred Interfaces cmdstanr	3	100	December 4, 2024
Projpred: Projection Predictive Feature Selection now in CRAN Publicity	2	790	February 20, 2018
New projpred 2.1.1 in CRAN Publicity	1	533	April 28, 2022
Projpred with mixed models Modeling loo	5	720	July 2, 2019

Variable selection with cmdstanr and projpred

Related topics