What's the most mind-blowing thing that you've seen done in Stan?

This is a fun post that I hope highlights (possibly put on the Stan blog) some unique and interesting Stan models. I’d say that many of the models in the SUG (Stan User’s Guide, for those new here) were pretty mind blowing, but are now commonplace. I’d like to know if anyone has recent - or stumbled upon - Stan models that either changed your perception of what is possible or were just down-right cool. And adding why it was so interesting to you and, if possible, a version of the Stan model (or a link).

I’ll start: the paper Bayesian Quantile Matching Estimation (which was mentioned in this forum at Bayesian Quantile Matching Estimation (using Order Statistics) - #4 by bertschi, which I completely missed at the time). Is really cool to me because I attempted a CDF regression in Stan that was just not working well. The idea was to match some quantile data to a known distribution. I approached this as a regression problem using a normal distribution and “minimizing the squared error” or, in Stan, setting \sigma to a small value. Not nice.

This approach is much more principled and utilizes order statistics to fit the data, which takes into account the difference in probability across the range of the CDF. It’s so much more elegant than kludging together a regression which treats each observation as the same. I’ve modified the Stan code from their repo for efficiency (there’s stan-math issue to add orderstatistics I opened where @Bob_Carpenter suggested the changes to orderstatistics_lpdf) :

// saved as os_gamma.stan
  real orderstatistics_lpdf(vector U, vector q, int N){
        int M = num_elements(q);
        // real lpdf = lgamma(N + 1) - lgamma(N * q[1]) - lgamma(N * (1 - q[M]) + 1); // normalizing constant
        real lpdf = (N * q[1] - 1) * log(U[1]);
        vector[M - 1] diff_q = q[2:M] - q[1:M-1];
        vector[M - 1] diff_U = U[2:M] - U[1:M-1];
        lpdf += N * (1 - q[M]) * log1m(U[M]);
        //lpdf -= sum(lgamma(N * diff_q));  // drop to encode lupdf
        lpdf += sum((N * diff_q - 1) .* log(diff_U));

        return lpdf;
    int N;
    int M;
    vector[M] q;
    vector[M] X;
    real<lower=0> alpha;
    real<lower=0> beta;
transformed parameters{
    vector[M] U;
    for (m in 1:M)
        U[m] = gamma_cdf(X[m], alpha, beta);
    alpha ~ gamma(1.0, 1.2);
    beta ~ gamma(2.1, 2.2);
    U ~ orderstatistics(q, N);
    X ~ gamma(alpha, beta);
generated quantities {
    real predictive_dist = gamma_rng(alpha, beta);
    real log_prob = orderstatistics_lpdf(U | q, N);
    for (m in 1:M)
        log_prob += gamma_lpdf(X[m] | alpha, beta);
gamma_mod <- cmdstan_model("./os_gamma.stan")

stan_dat <- list(N = 1000,
                 M = 3,
                 q = c(0.25, 0.5, 0.75),
                 X = c(0.1, 1.0, 1.4)

mod_out <- gamma_mod$sample(data = stan_dat,
                            parallel_chains = 4

With output of

All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.5 seconds.
> mod_out
        variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
 lp__            -1540.96 -1540.65 0.99 0.71 -1542.97 -1540.01 1.00     1540     2059
 alpha               0.52     0.52 0.03 0.03     0.48     0.57 1.00     1179     1513
 beta                0.43     0.43 0.04 0.04     0.36     0.49 1.00     1117     1406
 U[1]                0.21     0.21 0.01 0.01     0.19     0.24 1.00     1828     2167
 U[2]                0.63     0.63 0.01 0.01     0.61     0.65 1.00     3967     3064
 U[3]                0.71     0.71 0.01 0.01     0.69     0.73 1.00     2830     2875
 predictive_dist     1.26     0.59 1.75 0.79     0.01     4.77 1.00     3867     4056
 log_prob        -1536.94 -1536.65 0.98 0.73 -1538.96 -1535.99 1.00     1533     2151

Thanks for starting this!

If self promotion is allowed, I’d like to highlight my NY R talk on factorization machines in stan: GitHub - adamlauretig/ny_r_talk: Slides and Code for NY R Users Meetup on January 9, 2020, "A Common Model, Separated by Two Disciplines: Bayesian Factorization Machines with Stan and R" presented by Adam Lauretig (includes slides).


Yes! Can you provide a bit more detail on what a factorization machine is and why it’s cool?


So, a traditional regression model is

y = \alpha_n + \alpha_j + X \beta + \epsilon

for two groups, n and j; where \epsilon is unobserved.

A factorization machine is
y = \alpha + X \beta + \gamma_n \cdot \delta_j^\intercal + \epsilon ,

where gamma and delta are latent factors for your groups – it’s a way of modeling unobserved heterogeneity in your groups by interacting the latent factors. They’re really common in recommender systems, and they have other names in other fields (interactive fixed effects, generalized linear latent variable models, etc).

Basically, it lets you bring some more flexibility into your linear model.


@simon_mills recently showed me an analysis performed using brms via our package flocker involving a very large multi-species occupancy model with phylogenetic random effects on the intercepts and on some of the coefficients.

This is mind-blowing because:

  • I was surprised that it was remotely workable/efficient to fit multiple phylogenetically structured random effects within the same model… and it turned out to be very efficient.
  • Nobody constructs occupancy models like this because they are a bit hard to code and they are assumed to be hard to fit. But brms make it so ridiculously easy to build the appropriate linear predictors that it’s very cheap to give these models a try… and they turn out to work!

At the recent ecology StanConnect, @zhrandell showed an application of ODE models to understand how sea urchins control ecosystem-level processes with their “stomach fullness” modeled as an ODE. I don’t know that the model was revolutionary per se, but in context it was mind-blowing because:

  • it’s so outside the box of how ecologists have approached this sort of problem in the past
  • and the realism introduced by the ODE turned out to really matter
  • so it’s a great example of how the utility of Stan as a general-purpose PPL is delivering new outside-the-box modeling approaches into the hands of scientists.