Estimating popularity of Stan and related packages

#1

Sometimes it’s useful for Stan devs to be able to tell how popular Stan is. For a recent talk I collected the following information. It’s difficult to estimate the absolute number of users, but we can get some idea from the following relative comparisons based on downloads in last 30 days (even if they don’t cover all possible download mirrors)

PyStan based on PePy download statistics (e.g. https://pepy.tech/project/pystan)

  • 0.06 x matplotlib
  • 0.15 x tensorflow
  • 10 x Tensorflow probability (Google)
  • 12 x PyMC3
  • 165 x Pyro (Uber)

RStan based on CRAN download statistics (e.g. pp<-cranlogs::cran_downloads(package = c("rstan"), from = "2019-04-05", to = "2019-05-04");sum(pp$count))

  • 0.04 x ggplot2
  • 1.3 x tensorflow
  • 2 x RJAGS
  • 7 x R2WinBUGS
  • 51 x greta (uses TF as backend)

In addition

  • fbprophet is 0.7 x pystan

and

  • loo is 0.91 x rstan
  • prophet is 0.36 x rstan
  • rstanarm is 0.33 x rstan
  • brms is 0.29 x rstan
8 Likes
#2

This is probably fine for calculating ratios but for some reason, the cranlogs package has never implemented an option to filter by unique IP addresses (along with version or perhaps day). To do that, you have to follow the approach at


and filter yourself.

1 Like
#3

From February Stan this Month: Subscribe here if you want our monthly newsletter.

I (Breck) ran the query ‘mc-stan.org’ on the research search engine scholar.google.com. I looked at all the results for the first three years and noted novel scientific areas:

  • 2012-2013: 23 papers. Topic areas include: anthropology, neuroscience, psychology, medicine, statistics, actuarial science, algae research, astrophysics, survey research, animal testing, ecology, machine learning, fisheries, sports statistics.

  • 2013-2014: 90 papers. New topic areas include: chemistry, cancer research, computer science, psycholinguistics, oceanography, semiconductors, environmental science, addiction research, economics.

  • 2014-2015: 152 papers. New topic areas include: STD research, moral decision theory, forestry, aerospace, education research, evolution, public policy, wildfire prediction, pharmacology, nutrition, ergonomics, ecotoxicology, genetics, history, wine quality, political science.

  • 2015-2016: 202 papers.

  • 2016-2017: 369 papers.

  • 2017-2018: 566 papers.

I inspected up to 2015 all results and there were no obvious duplicates. I also inspected all the topic area papers listed for actually referencing ‘mc-stan.org’ in a significant way so I have some faith that the counts are representative of actual Stan use.

A reasonable undergraduate project would be to build a classifier for research articles so we can track topic areas. There would need to be some accommodation of novel categories or one could try and boot strap the whole thing with LDA.

@Bob_Carpenter has suggested some alternative search terms:

Here’s how I’ve tried to search using only terms that should be unambiguous:

mc-stan.org”: 1650
“stan development team”: 2220
mc-stan.org” OR “stan development team”: 2620
mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 2970
“rstanarm” OR “mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 3080
“pystan” OR “rstanarm” OR “mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 3120

4 Likes