Stan Certification

All,
We are in the process of developing a MOOC (Massive Online Open Course) for Intro to Stan. This may be a revenue generation process for us, maybe a free thing, we don’t know yet. In any case there will always be scholarships.

Given the work of doing a MOOC, a lot, I want to have a target that specifies what the MOOC is teaching towards. I’d like it to be a certification and that is very likely to be a paid certification at basic Stan mastery. Again, there will always be scholarships.

Below is a draft Stan 1 certification specification that I threw together. It defines what it means to be a basic Stan programmer that can interface with statisticians and carry out Bayesian Modeling. It is much more about working in Stan than statistics.

Feed back appreciated. There is a Google doc with the draft at: (https://docs.google.com/document/d/1RmeoieDUcq0YFIUW_PNDO_5aixXQgfwG0ZLj3Paj64Y/edit?usp=sharing). Edit/comment as you see fit. Pasted below to save you a click, you can comment in this thread as well and I’ll try and integrate.


This document is a publicly shared specification for what a professional Stan programmer should know and be able to do. This is meant to be the lowest level certification.

At this point we want overall items to consider.

Level 1 Stan Certification Objectives:

The mechanics of Stan programs and how they operate

  • Stan installation
    • Install CmdStan, python interface, R interface, relevant IDEs
    • Know of cloud based resources
    • Knows how to handle 3 most common install problems
  • Data ingest
    • Fluent in Stan data{} block with input from R, Python, JSON in interfaces and CmdStan. Reading data and saving outputs
    • Can debug top 5 common errors on data ingest.
  • Data manipulation
    • Demonstrate data munging skills in the transformed data{} block.
    • Can debug 3 most common errors on data transformation.
  • Parameter Mastery
    • Set up multiple parameters for estimation
    • Understand role of <upper=,lower=>
    • Work with vectorized parameters.
  • Know execution order and when it happens for all Stan blocks
  • User defined functions
    • Reimplement normal/exp/uniform dists as user defined functions
  • Fluency with various scales used in Stan (log, decimal)
  • Familiar with over/underflow.
    • Ability to correct over/underflow in common situations.
    • Convert to log scale
    • Change priors or upper/lower constraints
  • Running cmdStan and invocation options. Knowledge of all algos and appropriate use cases.
    • Optimize
    • Sample
    • method=sample algorithm=fixed_param num_samples=1
    • no warmup
  • Knowledge of how to use generated quantities {}
    • Use in prediction
    • Posterior checks
  • Debugging Stan
  • Print statements

Basic Bayesian Modeling

For all modeling tasks the test taker should show when appropriate:

  • Prior predictive checks

  • Motivate choice of prior

  • Motivate choice of likelihood

  • Demonstrate knowledge of runtime diagnostics

  • Demonstrate knowledge of posterior diagnostics

  • Demonstrate posterior checks

  • ?? more bayesian workflow ??

  • Show predictive interpretation

  • Discrete data

    • Code coin flip/baseball ab test with
      • Total pooling
      • No pooling
      • Partial pooling
    • Code multivariate case
    • Code naive Bayes classifier
    • Code a regression model
    • Hierarchical models
  • Continuous data

    • Normal data set e.g. height
    • ……
  • Knowledge of core distribution

  • Knowledge of how to use ODE solver

  • Visualising the fit

Communication

Bayesian modeling requires specialized techniques to communicate with non-practitioners. From David Spieglehalter’s suggestions:

Motivating Priors

Conveying results to non-Bayesians

Common Stan Gotchyas

Can’t send an array of length 1 to Stan from RStan

7 Likes

While admittedly it would be useful, this essentially requires the user to be familiar in all these interfaces to obtain Stan cert. I’m not sure that is realistic or really desirable seeing as most people have a preferred stats scripting language. Why not just give a badge of the interface language if that’s important? Then people can gain multiple badges if they can use multiple scripting languages.

4 Likes

I agree with @lauren. Maybe have a base Stan certification and then branching sub certifications for IDEs? Not sure how practical that would be. By way of teaching the same thing in multiple IDEs.

I was thinking that the cert covers how to get data into a Stan program,
run, get the fit back and rudimentary visualization in both R and
Python. Nothing fancy.

Breck

Well, this is timely as I post a job listing for which knowledge of Stan is a plus.

From a hiring perspective, it isn’t important to me that someone be fluent in all of the different interfaces. In practice, we are going to want them to use their favorite language, and we’ll be gauging that proficiency separately.

I do want them to know how to investigate warnings and errors and not just syntax issues. For example, I want some sort of demonstration that they understand when Jacobian warnings apply. I’d want them to have some basics of diagnosing difficult posteriors, with an arsenal beyond fiddling with adapt_delta and max_treedepth. Stan can run and give you answers when you really should not trust them. Even though this is not explicitly a statistics certification, a basic Stan programmer needs to be both appropriately concerned about warnings and reasonably empowered to implement the first pass of fix attempts.

Overall, this is a good list of core competencies. I can see this certification being useful for hiring managers in my industry.

3 Likes

hiring practices are bad enough. I don’t think we need to encourage companies to add barriers to entry by requiring candidates to get certification in order to add a revenue stream for the Stan project.

as Andrew has observed, setting up rules encourages people to game the system.

the strongest recommendation for Stan competency is experience using Stan to investigate a real-world problem in the form of a report, white paper, case study, or publication.

7 Likes

@mitzimorris, for me this is not a barrier to entry so much as a window into something hiring managers currently have little visibility into.

There is also utility in having Stan leadership define what it thinks the core competencies of a (non-statistician) Stan programmer would be, and then to prepare a scalable, open-access educational program to teach those competencies. I have mixed feelings about the payment portion, but it takes time and skill to develop a good MOOC, and it is not my place to tell people to value their time and skill at zero. Furthermore, is not only useful for hiring – MOOCs can be useful for onboarding new hires and on-the-job training of existing employees. Then certification cost is borne by the employer, which seems appropriate.

3 Likes

Stan is a domain-specific language for statistical modeling - what does “non-statistician Stan programmer” mean?

I agree with your earlier statement:

I think that an appropriate curriculum needs more input from people like you who are looking to hire.

2 Likes

I think it would mean someone who uses statistics to some extent in their work, but whose expertise is really in other fields, e.g., materials, pharma, psychology, political science, etc.

(FWIW, in my particular case, my background is more in engineering, and I have used Stan to fit material model parameters.)

A bunch of bioinformaticians, economists and even doctors that use Stan to fit models but have little to no statistical training.

This is a great thread folks. The purpose of the certification is to help focus teaching, raise some funds for Stan and as it is turning out helping refine our understanding of our user base.

On the money side, I want there to always be a scholarship/free route to getting the certification as well as for any MOOC materials we develop.

Breck

Thanks for getting started on this and taking feedback. In terms of teaching, a MOOC could certainly cover all the topics listed, but that would be way too much to cover in a three day intensive course and seems more like a semester (depends on background of course). So would the MOOC be the only real path to obtaining the certification (other than acquiring all of that knowledge independently)?

Also, I think things like familiarity with the ODE solver should probably be add-ons rather than part of the core certification. That really only comes up commonly in particular fields. And I agree with others that if we do this we should allow certifications for particular interfaces.

As I understand it, in other tech fields (Microsoft etc) the certificates are awarded after exams. The recommendation is that people don’t take the exam until the completion of a course plus some recommended number of years of experience.

I worry that some of the things listed here would be hard to teach without some degree of experience working with Stan.

Edited to add: but I think people take these exams without having done the courses so maybe that’s not what Breck means?

Interesting, thanks.

This is a good question. As I was drafting the certification it became quite clear that this is more than a 3-day class. For others not familiar, Jonah does a wonderful 3-day class in various formats and has taught hundreds if not into a thousand territory.

We can certainly scale it back, e.g., ODE solver use as you mention. Please suggest where other reductions make sense. I have reductions so far:

  1. Drop the whole idea of a certification
  2. Drop interface language proficiency
  3. Drop ODE solver skills

I have additions:

  1. Hit common error reports and appropriate diagnostics.
  2. Hit posterior diagnostics harder and how to fix problem ones.
  3. How to handle Jacobian warnings.
  4. Acknowledge that many/most users are not statisticians and factor that in.
  5. Demonstrate appropriate paranoia about whether fits worked or not.

Please keep it coming, this may well help us foster a better user environment.

Breck

Good points. My “vision” is that the cert be agnostic about how you get the skills to pass the exam. And yes, there is meant to be an exam at some point but more important it serves as a target for the MOOC and teaching. I think it is worth writing down what core Stan competency should be at a basic level.

The cert is a motivated by me realizing that ‘experts’ in Stan don’t know about key features, I won’t reveal who but it was interesting. Just looking at the requirements may be useful to round out one’s knowledge of Stan programming.

Thanks

Breck

I’m agnostic about the cert program. Considering resource involved, how effective would this route be to evangelize Stan compared with other options?

From hiring perspective my experience is that asking right question matters more than looking for a cert on resume, but searching for keywords is exactly what those hiring agents would be doing. So who knows?:/

1 Like

To be clear, I meant that this list has exactly I would expect it would contain, from the perspective of a person reading resumes (although, like @jonah mentioned, my company wouldn’t use the ODE part at all). This is also what I would want in order to get a Stan-naive new hire up to speed quickly. So the topic list looks good to me for both hiring and training.

I don’t think the initial Stan cert has to match to the level of more established certifications. My hunch is that a MOOC requiring less commitment on the part of the student (and, from an employer perspective, fewer hours of employee effort diverted to the training) would speed adoption more than a comprehensive course that tried to turn people into experts. That’s just a hunch, though.

3 Likes