All,
We are in the process of developing a MOOC (Massive Online Open Course) for Intro to Stan. This may be a revenue generation process for us, maybe a free thing, we don’t know yet. In any case there will always be scholarships.
Given the work of doing a MOOC, a lot, I want to have a target that specifies what the MOOC is teaching towards. I’d like it to be a certification and that is very likely to be a paid certification at basic Stan mastery. Again, there will always be scholarships.
Below is a draft Stan 1 certification specification that I threw together. It defines what it means to be a basic Stan programmer that can interface with statisticians and carry out Bayesian Modeling. It is much more about working in Stan than statistics.
This document is a publicly shared specification for what a professional Stan programmer should know and be able to do. This is meant to be the lowest level certification.
At this point we want overall items to consider.
Level 1 Stan Certification Objectives:
The mechanics of Stan programs and how they operate
Stan installation
Install CmdStan, python interface, R interface, relevant IDEs
Know of cloud based resources
Knows how to handle 3 most common install problems
Data ingest
Fluent in Stan data{} block with input from R, Python, JSON in interfaces and CmdStan. Reading data and saving outputs
Can debug top 5 common errors on data ingest.
Data manipulation
Demonstrate data munging skills in the transformed data{} block.
Can debug 3 most common errors on data transformation.
Parameter Mastery
Set up multiple parameters for estimation
Understand role of <upper=,lower=>
Work with vectorized parameters.
Know execution order and when it happens for all Stan blocks
User defined functions
Reimplement normal/exp/uniform dists as user defined functions
Fluency with various scales used in Stan (log, decimal)
Familiar with over/underflow.
Ability to correct over/underflow in common situations.
Convert to log scale
Change priors or upper/lower constraints
Running cmdStan and invocation options. Knowledge of all algos and appropriate use cases.
Optimize
Sample
method=sample algorithm=fixed_param num_samples=1
no warmup
Knowledge of how to use generated quantities {}
Use in prediction
Posterior checks
Debugging Stan
Print statements
Basic Bayesian Modeling
For all modeling tasks the test taker should show when appropriate:
Prior predictive checks
Motivate choice of prior
Motivate choice of likelihood
Demonstrate knowledge of runtime diagnostics
Demonstrate knowledge of posterior diagnostics
Demonstrate posterior checks
?? more bayesian workflow ??
Show predictive interpretation
Discrete data
Code coin flip/baseball ab test with
Total pooling
No pooling
Partial pooling
Code multivariate case
Code naive Bayes classifier
Code a regression model
Hierarchical models
Continuous data
Normal data set e.g. height
……
Knowledge of core distribution
Knowledge of how to use ODE solver
Visualising the fit
Communication
Bayesian modeling requires specialized techniques to communicate with non-practitioners. From David Spieglehalter’s suggestions:
Motivating Priors
Conveying results to non-Bayesians
Common Stan Gotchyas
Can’t send an array of length 1 to Stan from RStan
While admittedly it would be useful, this essentially requires the user to be familiar in all these interfaces to obtain Stan cert. I’m not sure that is realistic or really desirable seeing as most people have a preferred stats scripting language. Why not just give a badge of the interface language if that’s important? Then people can gain multiple badges if they can use multiple scripting languages.
I agree with @lauren. Maybe have a base Stan certification and then branching sub certifications for IDEs? Not sure how practical that would be. By way of teaching the same thing in multiple IDEs.
I was thinking that the cert covers how to get data into a Stan program,
run, get the fit back and rudimentary visualization in both R and
Python. Nothing fancy.
Well, this is timely as I post a job listing for which knowledge of Stan is a plus.
From a hiring perspective, it isn’t important to me that someone be fluent in all of the different interfaces. In practice, we are going to want them to use their favorite language, and we’ll be gauging that proficiency separately.
I do want them to know how to investigate warnings and errors and not just syntax issues. For example, I want some sort of demonstration that they understand when Jacobian warnings apply. I’d want them to have some basics of diagnosing difficult posteriors, with an arsenal beyond fiddling with adapt_delta and max_treedepth. Stan can run and give you answers when you really should not trust them. Even though this is not explicitly a statistics certification, a basic Stan programmer needs to be both appropriately concerned about warnings and reasonably empowered to implement the first pass of fix attempts.
Overall, this is a good list of core competencies. I can see this certification being useful for hiring managers in my industry.
hiring practices are bad enough. I don’t think we need to encourage companies to add barriers to entry by requiring candidates to get certification in order to add a revenue stream for the Stan project.
as Andrew has observed, setting up rules encourages people to game the system.
the strongest recommendation for Stan competency is experience using Stan to investigate a real-world problem in the form of a report, white paper, case study, or publication.
@mitzimorris, for me this is not a barrier to entry so much as a window into something hiring managers currently have little visibility into.
There is also utility in having Stan leadership define what it thinks the core competencies of a (non-statistician) Stan programmer would be, and then to prepare a scalable, open-access educational program to teach those competencies. I have mixed feelings about the payment portion, but it takes time and skill to develop a good MOOC, and it is not my place to tell people to value their time and skill at zero. Furthermore, is not only useful for hiring – MOOCs can be useful for onboarding new hires and on-the-job training of existing employees. Then certification cost is borne by the employer, which seems appropriate.
I think it would mean someone who uses statistics to some extent in their work, but whose expertise is really in other fields, e.g., materials, pharma, psychology, political science, etc.
(FWIW, in my particular case, my background is more in engineering, and I have used Stan to fit material model parameters.)
This is a great thread folks. The purpose of the certification is to help focus teaching, raise some funds for Stan and as it is turning out helping refine our understanding of our user base.
On the money side, I want there to always be a scholarship/free route to getting the certification as well as for any MOOC materials we develop.
Thanks for getting started on this and taking feedback. In terms of teaching, a MOOC could certainly cover all the topics listed, but that would be way too much to cover in a three day intensive course and seems more like a semester (depends on background of course). So would the MOOC be the only real path to obtaining the certification (other than acquiring all of that knowledge independently)?
Also, I think things like familiarity with the ODE solver should probably be add-ons rather than part of the core certification. That really only comes up commonly in particular fields. And I agree with others that if we do this we should allow certifications for particular interfaces.
As I understand it, in other tech fields (Microsoft etc) the certificates are awarded after exams. The recommendation is that people don’t take the exam until the completion of a course plus some recommended number of years of experience.
I worry that some of the things listed here would be hard to teach without some degree of experience working with Stan.
Edited to add: but I think people take these exams without having done the courses so maybe that’s not what Breck means?
This is a good question. As I was drafting the certification it became quite clear that this is more than a 3-day class. For others not familiar, Jonah does a wonderful 3-day class in various formats and has taught hundreds if not into a thousand territory.
We can certainly scale it back, e.g., ODE solver use as you mention. Please suggest where other reductions make sense. I have reductions so far:
Drop the whole idea of a certification
Drop interface language proficiency
Drop ODE solver skills
I have additions:
Hit common error reports and appropriate diagnostics.
Hit posterior diagnostics harder and how to fix problem ones.
How to handle Jacobian warnings.
Acknowledge that many/most users are not statisticians and factor that in.
Demonstrate appropriate paranoia about whether fits worked or not.
Please keep it coming, this may well help us foster a better user environment.
Good points. My “vision” is that the cert be agnostic about how you get the skills to pass the exam. And yes, there is meant to be an exam at some point but more important it serves as a target for the MOOC and teaching. I think it is worth writing down what core Stan competency should be at a basic level.
The cert is a motivated by me realizing that ‘experts’ in Stan don’t know about key features, I won’t reveal who but it was interesting. Just looking at the requirements may be useful to round out one’s knowledge of Stan programming.
I’m agnostic about the cert program. Considering resource involved, how effective would this route be to evangelize Stan compared with other options?
From hiring perspective my experience is that asking right question matters more than looking for a cert on resume, but searching for keywords is exactly what those hiring agents would be doing. So who knows?:/
To be clear, I meant that this list has exactly I would expect it would contain, from the perspective of a person reading resumes (although, like @jonah mentioned, my company wouldn’t use the ODE part at all). This is also what I would want in order to get a Stan-naive new hire up to speed quickly. So the topic list looks good to me for both hiring and training.
I don’t think the initial Stan cert has to match to the level of more established certifications. My hunch is that a MOOC requiring less commitment on the part of the student (and, from an employer perspective, fewer hours of employee effort diverted to the training) would speed adoption more than a comprehensive course that tried to turn people into experts. That’s just a hunch, though.