Dear Stan collaborators with programming language interests:
I had an experience in class the other day which gave me some insight into the whole issue of Stan being more “Pythonic” or “Rthonic.”
I’d always thought of “Pythonic”/“Rthonic” as just being a stylistic thing, with some users wanting a certain look and feel with their code. But now I realize it’s more than that.
What happened was that I had a homework problem where I gave them a simple statistical model and asked them to compute the posterior mode and curvature, and then construct and draw from the relevant normal approximation. When we were discussing the solutions, one of the students said that it was really easy: he just wrote the function in Python, then used Pytorch to compute gradients, then used some gradient-based optimizer.
Of course you can also use Stan to write an objective function, compute gradients, and do gradient-based optimization! But this wasn’t obvious to students, because Stan appears to be a bundled package.
So it struck me that what would really make Stan more Pythonic or Rthonic is modularity, for a Stan call to be less bundled.
In particular, what if we think like this:
A Stan program is instructions to compute a target and its gradient. Input data and parameters, output target and gradient vector.
Stan also comes with an optimizer: input data, initial values for parameters, and tuning parameters for the optimizer, and a number of chains, and it outputs a solution for each chain and some diagnostics.
Also NUTS etc.
I have a feeling that at this point, Bob will (correctly) reply that this is what Stan already is, so what’s my problem? And to this I’d reply that, somehow, this is not what Stan looks like or how we use it.
Here’s an example of how it could be useful to set things up:
I’m working in R or Python and I write a Stan program. I do this because it’s convenient, typically because the target function I’m working with has some statistical interpretation and Stan’s modeling language is a good way to express what I want to write. (This will be even more so in the future with ragged arrays and other features that are in the pipeline.)
Then I do something like foo <- stan_compile(“foo.stan”), or the equivalent in Python.
I can now play around, for example:
a <- list(…)
b <- list(…)
foo(data=a, parameters=b) returns the target function and its gradient (and also does whatever foo.stan might do in its run, such as print statements).
hessian(foo, data=a, parameters=b) returns the second derivative matrix (which can be computed in Stan or else in R using numerical differentiation)
stan_optimize(foo, data=a, inits=…, tuning_parameters=…, num_chains=4)
stan_sample(foo, data=a, inits=…, tuning_parameters=…, num_chains=4)
summary(stan_sample(…)) would return basic inferences and convergence diagnostics etc.
All these would work in Python too, or for that matter from the command line.
The point is that, once we’ve done foo <- stan_compile(“foo.stan”), or the Python equivalent, “foo()” just looks like an R function (or they Python equivalent).
I’m wondering whether it would make sense to do CmdStanR, CmdStanPy, etc., that way?
I’m sure I have some details wrong on this, but I think the basic idea here is super-important. I expect this is what Mitzi, Sean, Bob, etc., have been thinking about already, so it’s probably just a communication issue.
But I think it’s a very important communication issue so I’d like to figure this out. I think it will also be important for our workflow book, and also for people trying to see how Stan fits in the ecosystem with Pytorch and various autodiff libraries.