Estimating popularity of Stan and related packages

avehtari · May 9, 2019, 3:59pm

Sometimes it’s useful for Stan devs to be able to tell how popular Stan is. For a recent talk I collected the following information. It’s difficult to estimate the absolute number of users, but we can get some idea from the following relative comparisons based on downloads in last 30 days (even if they don’t cover all possible download mirrors)

PyStan based on PePy download statistics (e.g. https://pepy.tech/project/pystan)

0.06 x matplotlib
0.15 x tensorflow
10 x Tensorflow probability (Google)
12 x PyMC3
165 x Pyro (Uber)

RStan based on CRAN download statistics (e.g. pp<-cranlogs::cran_downloads(package = c("rstan"), from = "2019-04-05", to = "2019-05-04");sum(pp$count))

0.04 x ggplot2
1.3 x tensorflow
2 x RJAGS
7 x R2WinBUGS
51 x greta (uses TF as backend)

In addition

fbprophet is 0.7 x pystan

and

loo is 0.91 x rstan
prophet is 0.36 x rstan
rstanarm is 0.33 x rstan
brms is 0.29 x rstan

bgoodri · May 9, 2019, 6:24pm

This is probably fine for calculating ratios but for some reason, the cranlogs package has never implemented an option to filter by unique IP addresses (along with version or perhaps day). To do that, you have to follow the approach at
https://www.nicebread.de/finally-tracking-cran-packages-downloads/
and filter yourself.

breckbaldwin · May 9, 2019, 10:37pm

From February Stan this Month: Subscribe here if you want our monthly newsletter.

I (Breck) ran the query ‘mc-stan.org’ on the research search engine scholar.google.com. I looked at all the results for the first three years and noted novel scientific areas:

2012-2013: 23 papers. Topic areas include: anthropology, neuroscience, psychology, medicine, statistics, actuarial science, algae research, astrophysics, survey research, animal testing, ecology, machine learning, fisheries, sports statistics.

2013-2014: 90 papers. New topic areas include: chemistry, cancer research, computer science, psycholinguistics, oceanography, semiconductors, environmental science, addiction research, economics.

2014-2015: 152 papers. New topic areas include: STD research, moral decision theory, forestry, aerospace, education research, evolution, public policy, wildfire prediction, pharmacology, nutrition, ergonomics, ecotoxicology, genetics, history, wine quality, political science.

2015-2016: 202 papers.

2016-2017: 369 papers.

2017-2018: 566 papers.

I inspected up to 2015 all results and there were no obvious duplicates. I also inspected all the topic area papers listed for actually referencing ‘mc-stan.org’ in a significant way so I have some faith that the counts are representative of actual Stan use.

A reasonable undergraduate project would be to build a classifier for research articles so we can track topic areas. There would need to be some accommodation of novel categories or one could try and boot strap the whole thing with LDA.

@Bob_Carpenter has suggested some alternative search terms:

Here’s how I’ve tried to search using only terms that should be unambiguous:

“mc-stan.org”: 1650
“stan development team”: 2220
“mc-stan.org” OR “stan development team”: 2620
“mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 2970
“rstanarm” OR “mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 3080
“pystan” OR “rstanarm” OR “mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 3120

Topic		Replies	Views
Stan ecosystem usage metrics R packages + related packages General	10	922	October 7, 2023
StanEstimators Initial Release and RFC: Estimating R functions using Stan! Publicity	0	346	January 8, 2024
Are there many members of the community here who are active on r/statistics? Publicity	1	627	May 5, 2020
New releases of rstanarm and rstantools R packages Announcements	12	1215	October 12, 2019
CmdStan & Stan 2.35 release candidate General	17	1061	May 23, 2024

Estimating popularity of Stan and related packages

Related topics