Coupling Stan with a Dynamic Vegetation Model

Hi Everyone,

I have a process-based vegetation model in C++ that I’d like to integrate with Stan. What I’d like to achieve is similar to the PEcAn Project (http://pecanproject.github.io/), in that I want to be able to do uncertainty and sensitivity analyses, parameter estimation, and state assimilation using external data sources (e.g., remote sensing). Here, uncertainties are related to (1) measurements; (2) parameters; (3) models generating parameters (e.g., allometric equations); (4) model processes; and, (5) predictions, or the joint uncertainty. How may I achieve this with Stan? While PEcAn is excellent, I am hoping for something simple and focused, implemented in C++ or Python.

For the vegetation model, only the model inputs and outputs are available currently. In the future, it would be ideal to integrate Stan directly into the C++ vegetation model program to propagate uncertainties for each individual process, for which the uncertainties may be known a priori or estimated from empirical data. How can this also be accomplished?

It seems that simplest way of naively estimating prediction uncertainty at the moment is randomly sampling from parameter distributions and running multiple simulations, in a Monte Carlo approach. Another approach would be to apply Gaussian processes to jointly estimate the vector of parameters in an optimization framework. Can this be done with Stan? I have implemented one such recent GP approach in C++, which may be possible to link to Stan I assume? Thankfully, C++11 has nice facilities for sampling from distributions.

Last, I hear that Michael (Betancourt) was in the process of applying NUTS to Riemann Manifold HMC within Stan. Is this now available?

Thank you in advance to everyone and to Bob for his initial feedback,

Adam Erickson

You might want to have a look at the intro to the manual or the JSS paper to get a sense of what Stan does. I’m not sure if you want to build an application and use Stan as a component in it or if you want to take something custom you have and integrate it into Stan so that it’s available from the Stan language. We don’t do much of the former other than through RStan.

Bayesian posterior uncertainty with MCMC and Hessian-based sensitivities for penalized MLE where it exists

Not sure what you mean by inputs and outputs.

We don’t have a lot written on how to use Stan at the C++ level. Usually the approach is to take custom C++ that implement a function and its gradient and use those to implement a Stan function to be used from the Stan language.

Not sure what you mean by link to Stan here. You can code GPs in Stan’s language or in C++ and fit them with Stan.

It’s available in the C++, but not as a service, so it has an even lower-level interface.

Not sure what you’re referring to here.

Hi Bob,

Thank you for the message. You are correct that I want to

build an application and use Stan as a component in it

I have looked over the papers (Stan and Stan Math) and have been going over the manual. I haven’t seen any examples of Stan used directly in C++, aside from the Stan Math paper, and I’ve read the recent threads requesting this functionality (i.e., a high-level C++ API). I understand that the Stan language is in fact a transpiler. So, perhaps it makes more sense to reimplement only the functions I need from Stan? That way, I could base it on our data structures. I have read the suggestion to others to use CmdStan as a transpiler for generating a header file, which may be another option.

Not sure what you mean by inputs and outputs.

Model parameters and simulation results for a suite of metrics.

You can code GPs in Stan’s language or in C++ and fit them with Stan.

Is there an example of fitting a GP coded in C++ with Stan? I ported an algorithm similar to TPE from Python to C++.

It’s available in the C++, but not as a service, so it has an even lower-level interface.

Which file? Sorry, I did not see it.

Not sure what you’re referring to here.

It’s nothing fancy, but C++11 added many distributions and some random number generator facilities (link), while C++17 added std::sample.

Here’s the result of searching for “riemannian” in the stan repo.

I followed that link, but don’t see what it has to do with C++11. We use Boost random number generators which let us set seeds and also efficiently advance. We tie all of our algorithms into a single random number generator which is shared with the program RNGs (things like normal_rng built into the Stan language).

Yes, there’s an entire chapter in the manual on fitting GPs and lots of examples floating around the web and lots of discussion on these forums.

Hi Bob,

Thanks again for your help. The tricky thing seems to be connecting Stan to a ‘black-box’ model that cannot be implemented explicitly in Stan. I have not seen any examples of this functionality.

I’ve been reading through the manual as you suggested, in particular Chapters 6 and 22 related to user-defined functions. What I’m wondering is, since Stan transpiles code to C++, can I write a user-defined function that calls system() to run a simulation at the CLI, waits for the process to finish, and reads the CSV file containing the results into Stan? The user-defined function would take Stan variables as simulation arguments. Since this is trivial to do in C++, I would think it possible in Stan? I don’t know how Stan parses the .stan file arguments, but I assume it is done here cmdstan/src/cmdstan/arguments/argument_parser.hpp ? Let me know if there is a particular dev I should talk to about this. Basically, if it would be possible to allow a system() call and a few lines for std::fstream that should be enough. The simulation results would be strongly typed similar to other Stan variables.

Let’s call your external blackbox y = f(x) for some (possibly large) inputs x and outputs y. Can you compute all of the partial derivatives \partial y_{i} / \partial x_{j}?

If you cannot then this discussion is already at an impasse as Stan needs the derivatives of all functions in order to implement its scalable algorithms.

If you can then the question is how to take external calculations of y and \partial y_{i} / \partial x_{j} and get them back into Stan. This is discussed in https://github.com/stan-dev/math/wiki/Adding-a-new-function-with-known-gradients. If the black box isn’t written in Fortran/C/C++ and you really do need a system call then you’ll probably need a platform-specify solution.

Hi Michael,

Thank you for your response. It should be possible to calculate the partial derivatives, at least for the simplified model version. I would assume a continuity of C1. This particular blackbox should be differentiable and amenable to gradient-based MCMC methods such as HMC.

Another vegetation model of interest is stochastic, or probabilistic actually, in the frequentist sense. It is similar in structure to the classical forest fire model from statistical physics (link). I assume that it is differentiable over long time-scales (e.g., years), but Brownian and therefore lacking Lipschitz continuity over short time-scales (e.g., days). That would be an interesting analysis. If it is not suitable for HMC, perhaps DREAM (link) would be a reasonable alternative?

The main model is in C++, but definitely has a large X. Eventually, I am hoping to fully embed Stan in the model to propagate uncertainties arising from each process. Understanding the source of uncertainty is critical in model evaluation. Hence, I’m looking forward to a future high-level C++ API with few dependencies, but that is tangential to the current task.

Thanks again,

Adam

It’s not clear to me that DREAM actually generates a Markov chain with the right stationarity conditions . Ter Braak’s differential evolution, on which it’s based does. As does the Goodman and Weare ensemble method built into the Python black box system emcee.

We don’t require the data-generating process to be continuous, just the log density function over the parameters of the model. An example is a Gaussian process—those can model non-smooth functions with a set of parameters which have smooth derivatives.

Hi Bob,

Thank you very much for the suggestion and for the insight.

Adam

Update: it appears that DREAM should also work (link).

That doesn’t look like a proof that DREAM has the right stationary distribution. That paper has terrible advice like:

a common practice is to use uninformative priors within relatively wide parameter ranges such that the prior distribution has little influence on the estimation of the posterior distribution.

This is just plain wrong as Andrew’s shown in several papers and in his books. A wide pseudo-uniform prior like a gamma(1e-4, 1e-4) can pull estimates way out into the tails and also destroy the geometric ergodicity of the samplers (you need geometric ergodicity to converge fast enough for practical purposes in most cases).

Ensemble methods like this have problems in higher dimensions. I wrote a blog post trying to explain why. It’s usually pretty easy to diagnose the lack of mixing with multiple chains with diffuse initializations. The intuitions developed in lower dimensions just don’t hold, so interpolation and extrapoloation is a terrible way to generate new draws in high dimensions because you quickly leave the high mass volume of the posterior. It’s probably better than simple Metropolis or badly adapted Metropolis, but that isn’t saying much, as Metropolis also fails in high dimensions, as does Gibbs.

Thank you, Bob, I was hoping you would critique the methods : ) The quoted sentence does offer bad advice as I understand it, as that is equivalent to performing MLE. I love Andrew’s latest book, it’s dense and technical yet eloquent. Thank you for the link to the blog post, you also write very well.

So, that would collectively rule out Gibbs, Metropolis (BAT), DREAM, and MCMC Hammer… What would you recommend then? There is also the BayesFlow/Edward implementation of HMC, which is now at v2 and ready for use (link), but still does not have NUTS. I guess that would take us back to adding functions with known derivatives to Stan Math.

One alternative that has been used with some success for high-dimensional problems is simulated annealing.

The point is that the priors are informative and pull the estimates out into the tails where there’s a lot of probability mass. So that’s not at all like the MLE. Checkout the discussion in BDA of the gamma priors.

What’s the dimensionality of your problem?

That’s also going to require derivatives. And it’s usually done to deal with multimodality where it’s hard to initially get into the right region.

You pretty much need the derivatives to get good scaling with dimensionality for either optimization or sampling.

That would also require derivatives.

Thank you, Bob. I will look it up in BDA. I was thinking generally of the effect of removing priors in calculating posteriors, but what you said makes complete sense.

The dimensions of the simulation inputs vary from model to model, but are often around 100 parameters and include vectors for time-series inputs with error estimates. There are around 10 simulation outputs that give us the overall error and one output of the highest interest. The simulations can be either deterministic or stochastic/probabilistic using an RNG.

That usually requires something that scales well with dimension.