Sometimes it’s useful for Stan devs to be able to tell how popular Stan is. For a recent talk I collected the following information. It’s difficult to estimate the absolute number of users, but we can get some idea from the following relative comparisons based on downloads in last 30 days (even if they don’t cover all possible download mirrors)
This is probably fine for calculating ratios but for some reason, the cranlogs package has never implemented an option to filter by unique IP addresses (along with version or perhaps day). To do that, you have to follow the approach at
2013-2014: 90 papers. New topic areas include: chemistry, cancer research, computer science, psycholinguistics, oceanography, semiconductors, environmental science, addiction research, economics.
2014-2015: 152 papers. New topic areas include: STD research, moral decision theory, forestry, aerospace, education research, evolution, public policy, wildfire prediction, pharmacology, nutrition, ergonomics, ecotoxicology, genetics, history, wine quality, political science.
2015-2016: 202 papers.
2016-2017: 369 papers.
2017-2018: 566 papers.
I inspected up to 2015 all results and there were no obvious duplicates. I also inspected all the topic area papers listed for actually referencing ‘mc-stan.org’ in a significant way so I have some faith that the counts are representative of actual Stan use.
A reasonable undergraduate project would be to build a classifier for research articles so we can track topic areas. There would need to be some accommodation of novel categories or one could try and boot strap the whole thing with LDA.
Here’s how I’ve tried to search using only terms that should be unambiguous:
“stan development team”: 2220
“mc-stan.org” OR “stan development team”: 2620
“mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 2970
“rstanarm” OR “mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 3080
“pystan” OR “rstanarm” OR “mc-stan.org” OR “stan development team” OR “Stan: A probabilistic programming language”: 3120