Request for help: Looking for large models to test with Stan GPU support

Hi all,

the next release of Stan should have an expanded Stan OpenCL and thus GPU support. We are currently looking for more real world models that we can test and evaluate the performance of the new backend with.

We are looking for models that:

  • use one of the below listed distributions with somewhat large inputs (size of vector or array > 5000)
  • take a considerable amount of time to fit (at least an hour or more)
  • you can can share the model and data for, even if only via e-mail and not on the forums.

List of supported lpdf/lpmf functions:

  • bernoulli_lpmf, bernoulli_logit_lpmf, bernoulli_logit_glm_lpmf
  • beta_lpdf, beta_proportion_lpdf
  • binomial_lpmf
  • categorical_logit_glm_lpmf
  • cauchy_lpdf
  • chi_square_lpdf
  • double_exponential_lpdf
  • exp_mod_normal_lpdf
  • exponential_lpdf
  • frechet_lpdf
  • gamma_lpdf
  • gumbel_lpdf
  • inv_chi_square_lpdf
  • inv_gamma_lpdf
  • logistic_lpdf
  • lognormal_lpdf
  • neg_binomial_lpmf, neg_binomial_2_lpmf, neg_binomial_2_log_lpmf, neg_binomial_2_log_glm_lpmf
  • normal_lpdf, normal_id_glm_lpdf
  • ordered_logistic_glm_lpmf
  • pareto_lpdf, pareto_type_2_lpdf
  • poisson_lpmf, poisson_log_lpmf, poisson_log_glm_lpmf
  • rayleigh_lpdf
  • scaled_inv_chi_square_lpdf
  • skew_normal_lpdf
  • student_t_lpdf
  • uniform_lpdf
  • weibull_lpdf

Thank you!

To give you a taste of what is to come:

A very simple model with the binomial distribution:

data {
  int N;
  int y[N];
  int x[N];
  vector[N] w;
parameters {
  vector[2] beta;
model {
  beta ~ normal(0,1);
  y ~ binomial(x , beta[1] + beta[2] * w);

is faster on a GPU for N > 10k for a single MCMC chain and for large N, fitting using a GPU is up to 60 times faster (tested using AMD Radeon VII and and i7 CPU. The speedup also increases for multiple chains.


Thanks Rok,

Happy to supply data/models on the issues I was working on in my other post? My models are taking days to run and I use several tricks to improve the performance.

I can supply via email if that interests you?

1 Like

That would be great. Thank you!

Here’s code to simulate and fit hierarchical data. It uses my reduced-redundant-computation trick, but there’s still a final call to normal() at the end that has lots of input if you choose large values for any of the data-simulation parameters at the top of the R script. Increasing num_trials should have the most targeted impact on that final likelihood call; increasing the others will increase the input to the likelihood but will also increase the amount of computation that has to happen before the likelihood.

hwg_fast.r (8.5 KB) hwg_fast.stan (3.4 KB) helper_functions.r (5.0 KB)

1 Like

If you find that one useful, I can also create a version that has a binomial outcome instead of normal.

I’m happy to supply data and a model by email. The data has some sensitivities so can’t be made Public but can be shared to a limited extent. Model run time is 5.5 hours for 500 iterations.

Thanks! My email is rok.cesnovar at

Do you need an example that converges cleanly? I’m fighting with a model that has severe identification problems with one data set (but not with others). The one with severe identification problems can run for several days to get decent bulk and tail ESS, but even with adapt_delta at 0.99, I get a lot of divergences. Another data set using the same code runs reasonably well, both a lot faster and few or no divergences.


No, even those that do not are more than welcome.

Great! I’ll send a link to a Github repository for an in-progress R package using RStan. The repository includes two data files that work well. The misbehaving dataset has not yet been published. I’ll send you the Github link and the misbehaving dataset via a message through Discourse later today.


1 Like