How many Stan users are there?

s.maskell · December 5, 2019, 4:46pm

It seems we could look at a number of pre-existing data sources (eg discourse views and contributors, papers, StanCon attendance etc) to inform an inference of how many people use Stan (and/or use things that use Stan). We could also generate new data (eg via surveys etc). Do we know the answer and/or how best to work it out?

lauren · December 5, 2019, 4:55pm

I’ve thought a tonne about surveying the Stan community or potentially capture-recapture techniques, but generating new data in this way would be a lot of work and my current funding wouldn’t cover it (that’ll change next year though!). I’d be interested in collaborating if anyone is interested. :)

s.maskell · December 5, 2019, 6:08pm

As is hopefully obvious, i am keen to help. I can probably deploy people on this who can help make this happen if we know what it is that we needed them to do.

avehtari · December 5, 2019, 6:20pm

How about asking who has registered to discourse? From that we could extrapolate the population size for those who we could reach by survey?

s.maskell · December 5, 2019, 7:11pm

@avehtari: who would we survey and how would we get the survey to them? I feel I am being dim.

lauren · December 5, 2019, 9:33pm

From memory the hope was to utilized a snowball survey (once you finish you forward onto other people in the population), which is generally what you use when you have a hard to define population.
One of the stats grads here, Jonathan Auerbach and I talked about starting multiple snowballs (Andrew’s blog, Discourse, Twitter, StanCon mailing lists, etc.) and then tracking what snowball people were recruited by (and if multiples) as a way of measuring coverage.

andrewgelman · December 6, 2019, 4:53am

This is a fascinating statistics question. Maybe I could post a blog on this and see if there are any thoughts.

jroon · December 6, 2019, 9:26am

@lauren I think you should name that design a snowball fight 😂

lauren · December 6, 2019, 12:12pm

That’s a fantastic name!!!

@andrewgelman you can, it’d be interesting to see what folks think.

Bob_Carpenter · December 7, 2019, 2:49am

What’s your definition of a Stan user? Is it someone who once downloaded Stan and ran a model? Or is it someone who uses Stan regularly, and if so, how regularly? What about people who use packages like brms or rstanarm or prophet that are built on top of Stan?

There are over 3K users registered on Discourse, but that doesn’t mean they’re regulars. Many of them only showed up once.

andrewgelman · December 7, 2019, 3:23am

Blog post on the topic scheduled for Monday.

s.maskell · December 7, 2019, 8:16am

For context, UK academic departments (eg the one I sit in) are assessed (as part of the “Research Excellence Framework”, REF) on the basis of some criteria: the next assessment is imminent, but the one after that will be in 2026. When aggregated over each entire University, the outcome of that assessment process modulates the amount of (Quality-related Research, QR) income that the University receives from the UK government. So, it’s important to inform the assessment process with pertinent information.

One of the three assessment criteria is “impact”, which relates to the uptake of academic research outside of the academic discipline it came from and is metricated in terms of “reach and significance” during the census period. Unfortunately, “reach and significance” is not defined quantitatively. However, the notion of the census period is defined as people using the research during a specified period of time (eg 2021-2026).

The specific motivation for my question is that I’d like to understand how we (locally to my department) could quantify the “reach and significance” of any enhancements to Stan that (we hope!) might come out of our work between now and 2026. It seems natural to start by finding out what we (as the Stan community) know about how big we are now.

So, in answer to your question, I think I’d ideally like to know how many people and/or organisations are making use of specific subsets of Stan’s code (including in any packages that use Stan) during a specified period. I’d also like to know where they are based geographically, whether people work in academic, industry or government, the demographic of applications they are working on, etc etc.

That’s clearly hinting at scope creep, but hopefully helps explain the specific reason I asked the question, which I see as an important step towards quantifying the impact of our work to help enhance Stan.

lauren · December 9, 2019, 1:05pm

I think this would be relevant outside of the REF as well - a useful addition to many grants/impact sections.

We were at the point where we were focussing on questions around this, plus an emphasis on barriers to entry. I can dig them up if there’s renewed interest. I believe the conclusion last time was that the time-cost of doing it wasn’t worth the expected benefit, but it could have changed! :)

andrewgelman · December 9, 2019, 8:40pm

I posted my question here: https://statmodeling.stat.columbia.edu/2019/12/09/how-many-stan-users-are-there/

avehtari · December 10, 2019, 10:56am

This paper might have useful ideas Using an Online Sample to Estimate the Size of an Offline Population how to estimate the number of those users who would be unlikely to be reached by survey.

mitzimorris · December 12, 2019, 12:47pm

Stack Overflow has tag “Stan” and 254 users (if I’m interpreting their graphics correctly) have listed Stan as things they’re interested in. about once a week someone asks a Stan question.

mcol · December 12, 2019, 12:56pm

254 is the number of questions with that tag, the number of users will be less than that.

mitzimorris · December 12, 2019, 3:48pm

doh!
users do list their interests, but no way to scrape that out of SO.

lauren · December 12, 2019, 4:15pm

It’s an interesting question! We might want to think about targeting this question using a statistical/surveying method and then benchmarking our population estimates against other potential indicators as another coverage check (number of downloads of Stan in R packages relative to Python etc.).

One of the challenges for this is agreeing upon what the definition of a Stan user is.

lauren · December 12, 2019, 7:38pm

I think we’re going to have a hangouts meeting on this topic to see if we can make some sort of plan. I’ve emailed folks who’s email I have but for those interested we’re picking a time here, and if you put your full name in I can probably google your email for an invite. :) https://www.when2meet.com/?8493674-nr5Ai

Topic		Replies	Views
Surveying the Stan user community General	9	783	March 12, 2020
Estimating popularity of Stan and related packages Developers	2	964	May 9, 2019
Are there many members of the community here who are active on r/statistics? Publicity	1	636	May 5, 2020
"Selling" Stan General	14	5415	April 7, 2018
Fostering Stan user communities through domain-specific resource pages General	35	3242	March 2, 2020

How many Stan users are there?

Related topics