What doc would help developers of the Math library?


#21

Maybe, I didn’t know what a UML diagram was until now, or maybe it’s something closer to what a non-developer (statistician, modeler, physicist) that otherwise knows what is required for the whole thing to work.
For instance, I think of MCMC inference a somewhat along these lines:

  1. compute model likelihood, gradient
  2. use MCMC algorithm to propose new states based on function above

that’s simple enough and intuitive to implement. Maybe you want/need autodiff for the gradient, then that goes in there between 1 and 2. Probably quite a bit more is needed for Stan do do everything it can, so you can break it down further, but maybe I don’t need to know that if all I want is to implement some different gradient computation, or just want to modify the sampler for whatever reason.

That would be helpful to people like me, don’t know if actually have the profile to develop, but that’s what I read into the category of “people considering it but having trouble”.


#22

Greta is doing this [plotting graphical models]: https://greta-dev.github.io/greta/get_started.html#plotting

Pretty cool.


#23

Thanks!

The best place to start is to send a note here to discourse or to one of the devs once you have an issue you think you want to work on or if you’re looking for something to work on. In addition to all this C++, we have tons of R and Python and other interface language work to do.

The closest thing to this is the JStatSoft paper.

There’s also the algorithms part of the manual that describes roughly how the MCMC and adaptation and variational inference and optimization work. As well as diagnostic mode. But it’s all pretty incomplete.

Most of our algorithms compute a log density or log likelihood depending on model and philosophical perspective. All our useful algorithms use gradients. Some also need Hessians and Riemannian HMC uses third-order derivatives (though in a compound form that is only quadratic to compute).

For full Bayes, we run MCMC, but we also have an optimizer for max penalized likelihood and posterior mode Laplace approximations, and also variational inference for approximate Bayes.

Stan’s default MCMC is an evolved form of NUTS. We’ve replaced the slice sampling with a mulitnomial sampling and compute acceptance rates and adaptation a bit differently. But the bottom line is that it’s not a Metropolis algorithm like static HMC that proposes then accepts or rejects. It’s more like it proposes a range of points including the starting point then selects one in a way that’s biased away from the starting point. That’s why it can be so efficient. That’s all covered in the NUTS paper, but fair warning, it’s dense. Michael Betancourt’s exhaustive HMC paper on arXiv is the best theory for all of this.

WinBUGS could dispaly graphical models visually as graphs. It even had a graph editor for building models with a GUI. But it was the 1990s and the results weren’t very pretty. The ones from Greta work much better.

Stan’s not a graphical modeling language, though, so you can’t produce such graphs. We could produce parse trees, but that’s different. Even the expression graph can vary based on iteration (like Pyro, unlike Edward or PyMC3, etc.)


#24

I believe what @twistedmersenne was asking for was not a specific model’s graph structure, but a diagram showing how Stan programs work in general with something like a node for the likelihood, another for HMC, and more for the service layers etc. showing how the Stan C++ code is laid out and interconnected… is that right?


#25

Yes, @seantalts, that’s exactly right.
And, @Bob_Carpenter, I know (in addition to the math library arXiv paper) about the JStatSoft paper, but that seems more directed to the modeling language and user. To use you example, I understand the specifics you describe, but I don’t know how the Stan model I write gets converted to C++/Eigen objects, which function computes the likelihood and gradients, and how they are passed to the sampler.

I gather from the wiki like some links posted above that you can include different headers depending on what you need (e.g. stan/math/rev/mat.hpp), but that tells me nothing about the methods you mentioned, for instance.
I guess I’m looking for something like “you .stan model is read by this function, model object is passed on as object X to autodiff function, and object Y is passed on to HMC/NUTS sampler which computes likelihood+gradient using attributes from object Y and storing them (some, at least, like lp__) on the chain’s traces”.

Anyway, sorry if that doesn’t make sense in terms of development, I don’t want to waste anymore of your time unnecessarily.


#26

I agree that the Stan Math arXiv paper is the best resource. As others state, I always get another breakhrough when I go back to it. It may however be a bit much for a beginner.

I was given the task of speeding up a model and I had no previous knowledge of working with Stan or the Stan Math library. I was lucky to come across a PR from @Stevo15025 and things went from there, with Steve doing most of the heavylifting of putting our GPU code in proper places :) WIthout him we would have spent too much time finding where everything goes. I agree that this might not be a typical case as I had no experience with Stan Math before that, not even as just a user.

So my suggestion would be to have some sort of example files (with complete code that works) for someone that wants to add new function that either does not exists yet or wants to add a newer/faster version. Maybe add more complete and thorough examples for functions with more arguments and vector functions, etc.

Resources like https://github.com/stan-dev/math/wiki/Adding-a-new-function-with-known-gradients are great, but complete working examples you could tinker with would help a beginner out.


#27

I was confused because of the link posted to Greta rendering graphical models.

It gets parsed to an abstract syntax tree then converted to C++ code. See stan/lang/compiler.hpp for the top-level entry into the code. You can also inspect the generated C++ code and read the Wiki about the model concept to see what’s being generated.

It makes sense, it’s just a combinatorial nightmare. Sean and I are going to take one pass through this for the half-day course we’re going to do for StanCon Helsinki end of August. That will inspire us to create a bunch of this material.

If you want to trace how a system works, start at the top. For Stan, that’s all the services calls in src/stan/services in the stan-dev repo. Diagonal NUTS with adaptation is the default sampling service and L-BFGS the default optimization and mean-field ADVI the default variational inference service function. Each of these runs differently, but the model evaluates the same for each in terms of computing log densities and gradients. The I/O is also very complicated as it’s all set up in terms of callbacks to allow interfaces to handle things directly through memory. For example, when the model object is constructed, it gets a var_context object that contains definitions of named, structured variables. The actual implementation is defined in the interfaces. Same with all the callback writers, loggers, and interrupts you see on the service methods.


#28

That course will be also recorded.