Lotka-Volterra population dynamics case study


I wrote a case study and would be happy to get feedback before finalizing it and putting it up on our web site.

Estimating Lotka-Volterra Predator-Prey Population Dynamics with Stan

The Lotka-Volterra equations define parametric differential equations for the fluctuation of predator and prey populations. To estimate the parameters of such a model, a model for measurement error and unexplained variation is layered on top of the deterministic dynamics. The model is coded in Stan and fit to data on Canadian lynxes and snowshoe hares based on numbers of pelts collected in the early 20th century by the Hudson Bay Company.

I had to post it here in .pdf form because the site won’t let me attach .html files.

The source is on GitHub in the stan-dev organization, example-models repo if you feel inclined to interact via pull request:

Ensuring positivity of ODE solution

I’ll check it out this weekend if you can wait that long.


This looks great. There are some latex errors with y_0 in the section
"Noise model for Lotka-Volterra dynamics".


No rush.

Maybe I need a different variable, because that’s supposed to be y0 everywhere, not y_0 as it’s a different variable than y, not the indexing of y.


Got it.

It does look like an error in the pdf. That might distract some people.
Perhaps there’s an alternative notation?

Can one use a text label? Like $\text{initial}$?


Great case study, I submitted a pull request with some proposed slight changes in the code.
Will try out the exercise.


This would have been great when I started my PhD, 6 or so years ago! I’ll have a read through and see if I have any comments.


@lionel68: Thanks! I’ll take a look.


@ariddell: I agree and will fix the text and the programs so they all match.


Hi, this is a great case study for people like me (fitting simulation models to data).

I notice you used the stan_model() and sampling() functions, maybe you explain the reason you do it like this instead of just using stan()?

I also see you vectorised the likelihood, so that is good (unlike the soil carbon model). Would this make a big speed difference? In fact I am interested in how you would improve MCMC speed (assuming the model itself was already optimised)

I would also be interested to see how you would do it if there was missing data points (most data sets are not complete like this one) - can you still vectorise the likelihood and is it still efficient to do so?

I am also interested in generated quantities (e.g. RMSE, autocorrelation of residuals), how would you calculate these in the Stan code? In my models I usually have a lot of generated quantities for diagnosis and prediction and I am worried these slow down the MCMC, but I prefer to keep them with the model in Stan rather than recreate the model in R to calculate them.

So these are points that might make the example even more useful for beginners.




Probably not as I would suspect speed to be dominated by the differential equation solver and derivatives.

I’ll try to add that. I keep forgetting to get this one finished and publish it.

Yes, you can still vectorize the likelihood with missing data points. It’s the solver that needs irregular series of times at which it needs solutions (you basically just fit to the times at which you have data then model those—the solutions automatically impute the missing data point expectations and further rngs through the error model would simulate the missing data points).

The generated quantities block evaluation happens after the MCMC step, so it only gets evaluated once with no derivatives. It’s probably more efficient to do in Stan than in R. We’re about to expose a way to rerun generated quantities blocks after the fact. It’s plumbed through the C++ infrastructure and just needs to get into the interfaces.

If you have true values, you could compute RMSE directly in Stan. Similarly with autocorrelation of residuals. But we don’t have good autocorrelation calculation functions in Stan yet, so that’d probably be easier on the outside.


Thanks for your reply, Bob! That was very insightful for me.

The models I have are usually forced by external factors such as rainfall, the values of which are presented as a daily total. For this reason the models are usually constructed as daily time step difference equations, the derivatives are not smooth (or even continuous), and we do not use an ODE solver. Nevertheless your Lotka-Volterra example is very useful.

Warm regards.


Great ! I would appreciate any pointers to case studies where probabilistic (stan-based or other) Lotka-Volterra models, or other classical dynamical models, are shown to provide added value (in whatever sense) over solutions that can be obtained with standard ODE solvers. I am looking for documented case studies in particular, although generic comments/experiences are welcome too.


Cool stuff Bob!

Is it common to model the noise as multiplicative on these ecology and biochemistry models where the species are positive? I hadn’t thought to do that. Is there a biological justification for that?

Also I’d imagine if your priors weren’t as informative you’d get parameter values that would make the species go negative at times. Have you encountered this?


Multiplicative errors just mean the error is proportional to the measurement—that is, we model the number of pelts collected as some fraction of the population with some variation based on the size of the population. If there were something like a hard limit on the number of pelts collected and measurements were all around that value, then additive error would probably make more sense. Basically, it’s a modeling decision and subject to the same kind of scrutiny as other decisions.

Ensuring population estimates are positive is a bit trickier. I’ll add a discussion to the case study.

The initial population values (z0) are constrained to be positive—the priors aren’t strong enough without the truncation to guarantee positivity themselves. Then the form of the diff eq ensures we only get positive solutions because the derivatives are proportional to the values.

real du_dt = (alpha - beta * v) * u;
real dv_dt = (-gamma + delta * u) * v;

This can indeed cause numerical problems if values are close to zero and the parameters are large. This is true in most models for extreme parameters, even simple regressions. So it’s always good to be careful.

There’s also the fact that we take the log of the population value, which will cause a hard rejection if were to come up.


Cool. I was just wondering because I started playing around with a biochemical model, and my first instinct was to try additive Gaussian error. I was just wondering if in your experience multiplicative errors tend to be more faithful to the data in these sorts of models, or if that was a common thing to do. That makes sense though that the error should depend on how many species there are.

Ah yes that makes sense. If the state is close to zero then it’s conceivable that the numerical integrator will take a discrete step that takes the state value below zero. I think maybe that’s less of a problem for Runge-Kutta methods than Multi-Step methods though.

So once you try to take the log of a negative value and it rejects, will the sample then just draw a new momentum and continue on sampling? I can imagine a scenario where there’s a hard boundary in parameter space that if you cross you’ll get trajectories that go negative, and should thus be rejected. In that case isn’t your log-likelihood technically discontinuous which is bad for the sampler?


How do you approach the inverse problem (inferring parameters from noisy measurements) with only an ODE solver? Classically, people would do things like solve a least-squares problem using an ODE solver and (typically non-gradient-based) optimizer, which is equivalent to a probabilistic model with independent normally-distributed noise. Then if you do classical sensitivity analysis, that’s like dropping a Laplace approximation (multivariate normal located at maximum likelihood estimate with covariance given by the inverse Hessian).

Where the Bayesian approach helps is in allowing you to include prior knowledge or jointly fit population models and individual-level models (fitting the priors along with the data, which provides population pooling). It also lets you combine multiple models very easily for different data sets (e.g., a mark-recapture model would be easy to integrate). And it lets you easily roll in things like measurement error models if you know things about how the data were collected.

Where Bayesian approaches shine is if you care about calibration. Bayesian models will give you posterior predictive distributions that provide probability intervals for quantities of interest. If the model’s well specified, this is free, but we usually have to measure it in practice because our models are all approximations.


Just pushed a final version to the web site:


This Tufte style is really fun. If you find problems, please let me know.

If you want to help with an extension, there are a lot of exercises. I sketched out how to deal with missing data in hints for an exercise, but didn’t go through with the whole thing.


And in case you wonder, a 4D L-V system could induce chaos.


I don’t understand much about chaos, but it sounds bad. Definitely going to make one of those exercises challenging!

I’ve never seen a 4D simplex viewed as a flattened tetrahedron, but it’s a cool idea.