Tech leads Roadmap meeting update draft

Stan 3 Roadmap Update - Interface Edition

The weekend before last (June 29-30), 8 Stan tech leads (Sean Talts, Jonah Gabry, Michael Betancourt, Ben Goodrich, Brian Parbhu, Allen Riddell, Mitzi Morris, Bob Carpenter) met and discussed a variety of items on the Stan roadmap, mostly having to do with everything from the Stan language up through the interfaces and into the doc. We explicitly didn’t cover anything to do with the Math library this time around as Daniel Lee couldn’t make it, and we also avoided talking about non-software topics. We also decided that the purview of our group within Stan would not include any topics or software that does not work across all Stan models - this leaves discussion of some popular pieces of software like the loo package and methodology for other venues.

The following is a report of that meeting in the form of a project list. We have many more projects than we have capacity to complete them, so a lot of exciting stuff (almost everything!) is up for grabs and all of it could use your help. If you’d like to self-nominate to lead one of the following efforts, please let us know and I’ll keep this document up to date.

Interface Package Architecture

This recommendation applies to all interfaces that link against our C++ code directly. The ideal interface architecture seems to split out the current functionality into 4 packages: analyze, fit, *stan, and visualize. We decided our group had no purview over the visualization packages as there aren’t many visualizations that aren’t model-specific.


This package just contains pre-compiled binaries for the functions exposed in Stan services (functions like the effective sample size calculation, producing a fit summary, etc). This package will not have runtime requirements on any Stan C++ header files (so no Math library, Stan headers, etc), nor will it require a C++ compiler toolchain at runtime. This will enable folks doing similar work who would like to use our calculations to use ours directly in a lightweight fashion without creating their own implementation. These functions are basically all at the parameter-level, with a signature of something like std::vector<double *> -> double.


StanFit is responsible for performing inference, e.g. creating a StanFit object to contain the results of fitting a stan model, like posterior samples, given data and a properly installed C++ toolchain. This package will require all of the Stan headers to be packaged into it as it will compile custom Stan models on the fly. It will be responsible for compiling models, instantiating them with data, and running an inference algorithm on the instantiated model.


This package ties the previous two packages together into a unified, easier-to-use interface similar to RStan and PyStan today, though without any visualization functionality. We’d like to import the functionality from both Analyze and Fit and provide the analyze functions as methods on a fit object that can apply the analyze function automatically across all of the parameters in the fit.

Cross-interface Standardization

We’d like to unify the naming conventions and structure of typical *Stan interfaces so that we can share doc, pedagogy, and research easily across interface languages. We’d also like to remove a lot of the confusing knobs and twiddly bits that most users do not need and create a more streamlined experience that lends itself to more easily doing the right thing.

Unifying names

We’d like to make the following name changes, first in the Stan services and CmdStan and then propagating to all of the interfaces. This will necessitate a major version number bump.

  • num_warmup -> num_warmup_iters
  • num_samples -> num_main_iters
  • iters -> removed as a concept; replaced by num_warmup_iters and num_main_iters
  • init_r -> init_range
  • refresh -> update_interval
  • adapt_delta -> stepsize_adapt_goal
  • lp__ -> target_lpd
  • lp_accum -> target_lpd_accum (generated code mostly)
  • For diagnostic files and unconstrained parameter name outputs, we’d like to prepend “unconstrained_” to all parameter names to drive home the difference.

Removing some parameters from all interfaces

We’ll get rid of thin, jitter, kappa, gamma, and t0. Folks can go into the source code of CmdStan fairly easily if they need to edit these.

Functions and methods

This list will serve to specify the minimum required functionality to be a full Stan interface. All names should line up with those in Stan services, which should be aligned to those in this document first.

  • model = compile_model(stan_src)
  • model.add_data / model$add_data
  • model.run_hmc / model$run_hmc
  • model.continue_hmc / model$continue_hmc
  • model.maximize / model$maximize
  • model.generate_quantities / model$generate_quantities

Returning results

We want to unify the way results are returned from fit.extract() across interfaces - waiting to hear back from @bgoodri on some details here.

New Lightweight Interfaces


CmdStan has been playing a dual role as a command-line interface for users as well as for interfaces that don’t want to dynamically link against a compiled Stan model (for good reason!). We formally recognize that CmdStan’s interface will focus on command-line use by humans. This just means that all the defaults will be centered around human interaction - e.g. text-based formats for input and output will be the defaults even if we add support for binary formats as an option.

CmdStanPy and CmdStanR

These interfaces will be architected to call CmdStan as a separate process for all Stan functionality and will provide a simple reference implementation for other languages to follow when creating a new interface. They should be fairly minimal pass-throughs directly to the underlying CmdStan that are very easy to update when a new version of CmdStan/Stan/Math are released.

Interlude: What’s nice about not doing an in-memory interface?

  • Quicker releases - updating to a new version of Core Stan from within your LangStan version should be doable without even upgrading your version of LangStan - you could run a function to download or switch to a new version in-place.
  • Robustness - crashing Stan wouldn’t bring down your e.g. R session.
  • C compiler agnostic - you wouldn’t need to download the specific version of a C++ toolchain that matched the one your language was built with. Many users already have installed C++ toolchains that don’t match that version and getting those systems to pick up the correct version has historically been a big pain point. We’d also be able to bump the C++ compiler version requirements on the Math library arbitrarily as needed by Math devs.
  • Easier and stricter interface - over the years, some of the language interfaces have started to depend on source code within the Stan project rather than using only the exposed APIs, which causes a ton of coordination overhead between interface devs and C++ devs as well as wasting a tremendous amount of interface dev time when some change goes in uncoordinated. Switching to a more black-box CmdStan or ServerStan based interface would prevent all of that.

ServerStan (Needs a tech lead! Could be YOU!)

For more robust programmatic consumption, we’ll create a separate ServerStan that is conceptually similar to Allen’s HTTPStan (underlying PyStan). The benefit to ServerStan over a CmdStan-based interface is essentially just that you can have access to fast log_prob evaluations (as well as a few other, somewhat more niche features of a similar flavor).

The key differences between Allen’s HTTPStan and ServerStan:

  • ServerStan will be compiled into a statically linked binary, one per Mac, Linux, and Windows, that should run on any version. Eventually, we’ll link against LLVM and Clang so that users won’t even need a C++ toolchain installed.
  • ServerStan might not use HTTP - we’ll look for something that works easily on R, Python, Julia, and a variety of languages but includes support for binary-encoded data messages, since processing text formats can take a ton of time. grpc seems to be a reasonable candidate with a long track record of good performance and bindings for many languages.
  • We can have the new compiler automatically generate .proto files for a given model to make really efficient interfaces.

Stan C++ Services

We’ll add diagnostic services to the C++ and simplify the code to get rid of things that aren’t needed to support interfaces (e.g. the ~16 flavors of HMC). For the new service API calls, we’ll provide functions on std::vector<double *> with compute_, check_, and format_*_error variants that all take the same inputs and return either the computed value, a boolean indicating if it passed the check, or a formatted string error message if the check didn’t pass, respectively. We’d also like to create example code showing how to call each of these services given a fit (we should be able to create a reasonable version of that when we create the tests for them). These functions will then be wrapped up and exposed as methods StanFit object in the downstream language-specific Stan interface - you’ll be able to call each of them or all of them at once on a specific set of parameters or on all of them at once. Here are the various quantities that will have this functionality:

  • effective_sample_size
  • rhat
  • treedepth
  • energy
  • quantile
  • mean
  • variance

For example, compute_rhat will return the rhat statistic, check_rhat will compute it and check that it is below a certain threshold that the Stan team has empirically found useful, and format_rhat will call check_rhat and, if there is an error, return a formatted string presentable to a user, something like “rhat of 1.2 was above safe threshold of 1.1.”

Model Class Augmentations

The model class was written with the principles of encapsulation and modularity in mind and does not expose the information required for increasingly complex interfaces and use-cases built around Stan. Luckily we can remedy most of this fairly easily by adding additional functionality without breaking anything, and eventually we can tweak some of the existing signatures (to e.g. get rid of all of the in/out arguments to functions that force the language interfaces to manage their own memory, and instead just take const inputs and return outputs as values) whenever the interfaces are ready to consume that change.

Faster compile times

Bob is working on a new formulation using virtual classes in C++ that should drastically improve compile times, but this change may be out by the time this document comes out. The premise here is that we’ll take some slight vtable lookup penalties for model method calls in exchange for ~30 seconds of reduced compile time (down to 7s in tests). This will entail interfaces shipping with a compiled library containing all of the Stan services and algorithms and means all access to model-specific data or behavior will be through overridden virtual methods.

Template parameter defaults

We’ll provide sensible defaults (and adjust names to be consistent across all of the C++ code) for all template parameters,

  • T__=double
  • propto__=false
  • jacobian__=false

New methods for ease of use

It’s easier for the R and Python interfaces to call methods on a fit object rather than hooking up a templated functional to a model. See e.g.

  • hessian
  • hessian_grad_product

Parameter metadata

We’re still discussing a way to holistically address generic parameter filtering across a few methods (transform_inits, log_prob, and write_array), but for the moment we’ve decided to include two new methods that expose sized parameter types on both the constrained and unconstrained scale as a JSON string. In general, exposing metadata like this in an easily-consumable format should add a lot of power to what these interfaces can do, and it solves the first use case on the list - more use-cases forthcoming. These methods should be coming up soon in the new compiler:

  • string get_unconstrained_types();
  • string get_constrained_types();

An example response for the constrained type for cov_matrix[N] m[K, J]; given N=27, K=3, J=4:


Stan language features

We only talked at a high level here about a few features and the next steps on any language changes would be to submit a design doc for the specific change. We talked a lot about user-defined derivatives and are actively soliciting submissions for use-cases and potential syntax to survey before coming up with a proposal. An interesting ‘pro’ to this feature would be that we could automate autodiff-based testing of user-defined derivatives if they specify them in Stan.

We also talked about:

  • A proper ‘extern’ keyword that automates much of the work required to hook up a custom C++ function with gradients.
  • Named blocks or annotations for things like “save these variables to output.”
  • Blockless - there were some strong arguments for preserving most of the blocked structure, but no one seemed like they’d miss transformed parameters, and most people seemed to think that the data and transformed data block could use a name more directly indicating that those blocks are for input and pre-processing.
  • Getting rid of arrays of real numbers from the language entirely to avoid confusion with vectors.
  • Folks who’d like to be able to access the gradient of Stan or User Defined Functions in the model block.

Miscellaneous ideas

  • Writing samples out in some format that it would be easy to run column-wise or other queries over the data as stored on disk. Something like sqlite might not even be crazy here.
  • With threading, we could now run multiple chains of warmup in parallel and pool information among the chains for faster adaptation.

New Compiler Integration

The new Stan compiler written in OCaml (“stanc3”) is ready for beta testing! We’re targeting replacing the old compiler completely for the 2.21.0 release on October 18th, 2019.

Future Roadmap Topics

  • Serialization of input data and output samples
  • General I/O framework
  • Updating Autodiff for parallelism
  • Integration of a C++ compiler

As always, please comment if I got things wrong or if you have ideas! And if any of these projects seem like places you might be able to help out please get in touch.


I’m not a tech lead. I was only there for the initial part of the discussion before anything like a decision was made.

Are these recommendations supposed to go for all the interface languages like Python, Julia, R, etc.?

Assuming the group in question is the TWG, does that mean (a) packages like bayesplot and shinystan are no longer part of Stan, or (b) that the TWG is not in charge of all of Stan software? I’d be OK with (a), but I don’t like solution (b).

I don’t get this. Aren’t there going to have to be R functions in the R analysis package? And we can use the math library, just not compile against it dyanmically, right?

I think we’d eventually want multivariate anaysis, but we can cross that bridge when we come to it.

What’s a StanFit object? You mean the heavyweight thing in R right now?

How does this account for optimization, variational inference, and autodiff checks, the other three things we can do with models?

I don’t think that’ll fly with CRAN. BH, StanHeaders, etc., are separate rather than packaged with RStan.

Can’t they just remain separate so that the analyze methods aren’t on a fit object, but on a lightweight representation of the chains from a fit object? And won’t we want the analysis package to be able to deal with all the parameters in a model at once? Otherwise, it’s not going to be very useful to people if they have to manually loop over everything all the time to call it.

I don’t think main means anything to anyone. And I don’t think it needs to be plural. How about num_warmup_iter and num_sampling_iter?

I don’t think update_interval will be clear on its own—it sounds like something controlling sampling. Was there a name kicked around that made it clear this is just text output? Maybe something even more explicit like console_update_interval? I like the interval part, though it could also be period.

I like all these (and thing target should match the name in the Stan code). lp_accum should definitely be hidden from users.

I’m OK with this, though I believe how most people control RStan (including me) is simply upping the total iterations and letting everyting else default.

I like this idea good idea. But I think it’ll be easier to read with the suffix _free, which is shorter, and will still let people scan easily by variable. But then if we alphabetize, it’ll come out the same way either way.

I think that’s a good idea. I think we may just get rid of jitter inside the algorithms, too. Something like jitter would be necessary for static HMC to generically avoid bad orbital periods, but we don’t need it for NUTS.

Where folks here are C++ programmers.

As I’ve said umpteen times, I really dislike this pattern of building up an inconsistent object like model over time. But then I can just wrap all this up in an interface I like for my own work the way Andrew does for all the R stuff he doesn’t like.

I also don’t like generate_quantities as a verb, as it just looks like a typo to me. I don’t have a verb as an alternative and we can hardly say generate_generated_quanties, can we?

What’s it supposed to do? I would prefer if the sampling routines just returned whatever this does directly.

What does “formally recognize” mean in this context? I objected to this. How are we supposed to build the lightweight winterfaces?

Command-line interface, I take it?

I want a lightweight interface that does what CmdStan does now without having to fiddle with ports, etc.

Most of this list is motivating a way to run Stan outside of the R and Python process. It doesn’t require a server to do any of these first four bullets. Using CmdStan out-of-process satisfies all of these goals.

What’s a service endpoint? How does it have a subcalss? What’s “DiagnosticStat”? I’m getting a bit lost in the jargon.

I also don’t undersand the compute, check and format distinction. Examples go a long way toward clarifying this kind of thing.

Running out of steam here. I’ll pick up next time with Model Class Augmentations.

How should interfaces handle parameter / posterior shapes? Should we follow common use cases in different languages or be more strict and follow Stan shape.

Currently there are no difference with the following posterior shapes in PyStan: real x/ vector[1] x/real x[1] (x.shape == (draws,)).

Also, is Stanc3 going to output dtype, so users would not need to scape stancode / cppcode?

It wasn’t - I avoided using that word because it’s a little ill-defined at the moment.

I’m going to take the other clarifying questions and ideas and work them into the draft rather than responding to them here. I think a lot of the very specific names &c are the result of a lot of discussion from 5 interface tech leads and the services lead, so they should be considered pretty unlikely to change at this point unless there are specific new and compelling proposals none of us thought of (which could totally be the case! if anyone has something please propose away with some argumentation as to why it makes more sense and we’ll definitely listen).

Ari, the extract() shapes are still being ironed out, but if you ctrl-F for “get_constrained_types” you can see how we’ll return the types of the parameters in both constrained and unconstrained space. Is that enough to get the dtype info you’re looking for or is there more you’d like?

Updated with further clarifications! @bgoodri, @ariddell, @bparbhu, @betanalpha do you all see any mistakes I made in transcribing and trying to fill in details we didn’t discuss? My attempt here was to get the ideas from the group down as much as possible and I’m happy to fix anything that didn’t translate or that I misunderstood.

it sounds like this draft reflects the views of 3 interface leads and Michael.

I don’t agree with many of the above points. I do, however, appreciate that this is being shared with the community. I feel that far more community input is necessary here before moving ahead.

Everything looks consistent with the consensuses (concensi?) we had reached by the end of Sunday. Regarding names I believe the exact suggestion was target_lpd for the replacement of lp.

1 Like

If you have specific alternative proposals, this is the thread to air them in and I welcome you to do so!

To clarify another one of your points - in addition to Ben (RStan) and Allen (PyStan), Brian was there representing Stan.jl, Michael was there representing the Services and Algorithms (which defines the names of CmdStan), and in your absence we had Jonah attempting to represent the concerns of CmdStanPy, so we had quorum for most of the interface-related stuff.

So, Jonah is great but Mitzi is the one who wrote and maintains CmdStanPy. She has, on the forum and on GitHub, started a range of conversations about resolving problems with the current naming. In the conversations I’ve participated in, she also provided detailed background on her reasoning and sought feedback from devs. She clearly should have been part of this conversations.

Asking her to reproduce the information in this thread as though it were a new proposal doesn’t make sense either. She did the work, her work is on GitHub and easilly accessible on this forum. You missed it and that’s going to happen sometimes. Having a quorum doesn’t change that (although it is surprising).

This is a good example of how relying on rules like quorum will go wrong because you can’t substitute “quorum” for the set of people who should be part of a decision.

My suggestion is you review the work Mitzi has done and give this conversation another try.

She attended the first meeting and declined to attend the later meetings.

Links are definitely okay! I didn’t mean new proposals needed to be drafted if some already existed. We do need to keep track of the cross-interface discussion in a single, cross-interface place; this discourse thread is probably the best venue we have.

Mitzi has written a lot of code for CmdStanPy, but we’d be remiss if we didn’t mention @maedoc who wrote the first version and graciously let us fork it while continuing to help us with it. Thank you both.

That unfortunately doesn’t change that she should have been part of that conversation. You can use quorum this way but it doesn’t lead anywhere productive.

Mitzi is the Technical Lead for CmdStanPy or am I missing something? I’ve worked with @maedoc and he’s great too but lets get our first-order concerns straight here.

I’ll defer to @mitzimorris if she happens to have a complete spreadsheet of suggested names or something but even if she doesn’t the CmdStanPy issues and discourse threads are pretty limited.

I’m not saying this is some sort of offense so no reason to edit posts (edits get really confusing in the e-mail digests). All I’m saying is that a chunk of actual work by one of the technical leads got missed and asking her to re-play it here is going to be less effective than actually reviewing the work.

IIRC the quorum thing looked like a good idea to people because there were some folks who like to talk in circles but Mitzi was never one of them so I don’t think there’s a risk to being flexible about it.

Mitzi has always been invited to be a part of any and all of these conversations. We can’t force anyone to participate.

I think there’s a little bit of confusion - the roadmap is essentially issued by TWG Director decree as we knew that it would never be full consensus. The fact that it got consensus from all of the other interface developers, the services/algo lead, and the TWG Director was a better outcome than I think anyone expected.

Again, I think we’re all happy to listen to any additional proposals - if you wouldn’t mind linking to the ones you’re talking about in this thread (I took a look at the issue tracker and didn’t see anything that seemed related by the titles, but I could have missed something), I think that would really help us move forward.

1 Like

+1. Thank you for highlighting this.

@seantalts, that wasn’t true. The first invite was not extended to @mitzimorris

These meetings have not been inclusive in their planning at all. Personally, I was not extended an invitation to either the meeting on Saturday or the Sunday meeting (which was unannounced at all). There was a poll asking some members of the TWG and some other developers (that were hand-chosen by @seantalts) for scheduling, but rather than seeking dates that could accommodate all, a sooner date was chosen. That would have been fine if it was a TWG meeting including all members of the TWG with an agenda and the authority to make decisions, but clearly, in earlier posts, this was not the intention.

I can’t express in words how frustrated and disappointed I am at this. As a leader of this open-source project, you really shouldn’t pick a handful of people to make decisions for the community. It’s fine to have a meeting to discuss and get on the same page, then make those suggestions to the rest of the community based on that information, but to have meetings where members of the TWG are not invited, are not accommodated, and then make decisions without the input of the community is not ok.

Sorry to burst your bubble, but there was no quorum since this meeting was not a TWG meeting. It was just a meeting where certain developers were invited to spend an expenses paid trip to discuss Stan. (The TWG did not have a say as to who was invited, whether it was a good use of SGB money, or what the budget was. There was no consultation with the TWG as a whole.)

The more I think about it, the more offended I am that you think you have the power to create closed groups within the community to make decisions without community input.

Let’s talk about making this better. Can you go on the record and state that not only was this not a TWG meeting, but the discussions there are meant to be summarized, shared, and discussed with the whole developer community? Please assure that no Stan developer will be penalized for having missed a meeting they were not invited to attend. Please enforce it.

Furthermore, as the TWG Director, please stop encouraging the behavior of having closed meetings where the open-source development community is excluded from participating.

Edit: I added a strikeout to the text that wasn’t correct above. (Sorry about the misinformation, @seantalts.) Sean invited Mitzi via the first email. Her removal from the thread wasn’t until later.

Here is a screenshot showing that you are wrong:

I sent out polls to check availability and asked those who wanted to participate to fill out the polls. I gave multiple reminders for this and then continued planning with everyone who had participated.

Please do the very simplest and easiest bit of homework of searching your email before you write such an incendiary polemic. And as a process matter, if you could please create a new thread either on discourse or within the SGB email list for any governance issues - this thread is about the roadmap draft, not about the process that created it. I’ll respond to any additional issues you have on that thread once you’ve done your homework.

1 Like

Thanks and you’re right about the very first email. That did go to Mitzi and myself. Point taken: your first email went to everyone.

But you’re still mistaken that you’ve always kept people involved. If you recall, I was dropped from that email thread intentionally by you. And it had already excluded Mitzi at that point.

The key thing here is that you’re excluding people. If you disagree after rereading that email thread, please respond.

1 Like

To be more specific, the original email with a poll went to a wise distribution. The first email with any details did not. Mitzi was not asked to participate in that discussion at all. And I also would not have been if @Bob_Carpenter didn’t explicitly add me back in.

Please see email that you sent on 5/31 and from Bob on 6/1.

Thanks for responding civilly. I think this is an important conversation and want to continue to make sure you know what happened, but I think it’s off-topic for this thread. Please continue here:

I made an edit to my post to clarify that I was incorrect. You indeed invited Mitzi to the poll.

I can respond on the other thread.

1 Like

Great, thanks. Back to the topic!

@bgoodri, still hoping to hear back from you at some point about some further details on how we should spec out returning posterior samples from a fit object in language interfaces. Do you want to put the proposal in this thread and we can discuss a little before adding it to the roadmap? Thanks!