I want a lowess replacement in Stan (or rstanarm or brms)

People sometimes ask how they can help with Stan.

If you can program C++, it’s my impression that there’s lots to do.

Below is an example of a project that would be useful that only requires programming in Stan (and some R, I guess).

We can use lowess to fit smoothed curves through data (it will do better than the sort of curve fitting favored by the U.S. government: https://statmodeling.stat.columbia.edu/2020/05/14/so-much-of-academia-is-about-connections-and-reputation-laundering/). But lowess has problems too, first because it’s not Bayesian so it’s difficult to include prior information, get uncertainty estimates, etc., and second because it’s more of a procedure than a model so it’s hard to use it as a component in larger models for example including measurement error, varying coefficients, etc.

This came up in a recent example with polling data where we wanted to fit a smooth curve through some survey estimates over time, and the smooth curve fit by the computer did all sorts of weird things.

For these reasons it would be good to have a “lowess equivalent” in Stan, maybe using splines or Gaussian processes (but not with the scaling problems of Gaussian processes as usually implemented), maybe building off existing options in rstanarm or brms) that would do the following three things:

  1. It would run out of the box using its default settings and produce a smooth curve (a posterior summary such as a pointwise median, along with many posterior draws of the curve) that could in one line be plotted along with the data.

  2. Like lowess, it would have one primary tuning parameter that could be set by the user. (Unlike lowess, this function in its default setting would average over or estimate the tuning parameter from the data.)

  3. The model can include multiple predictors. I’m not sure how general the function should be here. I guess a starting point would be whatever lowess can do in that regard.

Those have existed in both rstanarm and brms for years, except for (2) because you get the posterior distribution of the smoothness hyperparameters. Although they do (3), for the plots you refer to in (1), a choice has to be made about what values of the other predictors to graph the smooth function at.

1 Like

I guess i need a vignette, then!

Also, we could allow (2) in this hypothetical function by allowing options for strong priors for the smoothness hyperparameters.

We have vignettes

http://mc-stan.org/rstanarm/articles/glmer.html#relationship-to-gamm4

You also wrote a paper about life expectancy that used them.

1 Like

Really? I don’t remember that paper!

1 Like

https://statmodeling.stat.columbia.edu/2017/03/30/aggregate-age-adjusted-trends-death-rates-non-hispanic-whites-minorities-u-s/

Oh yeah, that! We never got around to writing this up as a paper, unfortuantely!