Lotka-Volterra population dynamics case study


I wrote a case study and would be happy to get feedback before finalizing it and putting it up on our web site.

Estimating Lotka-Volterra Predator-Prey Population Dynamics with Stan

The Lotka-Volterra equations define parametric differential equations for the fluctuation of predator and prey populations. To estimate the parameters of such a model, a model for measurement error and unexplained variation is layered on top of the deterministic dynamics. The model is coded in Stan and fit to data on Canadian lynxes and snowshoe hares based on numbers of pelts collected in the early 20th century by the Hudson Bay Company.

I had to post it here in .pdf form because the site won’t let me attach .html files.

The source is on GitHub in the stan-dev organization, example-models repo if you feel inclined to interact via pull request:


I’ll check it out this weekend if you can wait that long.


This looks great. There are some latex errors with y_0 in the section
"Noise model for Lotka-Volterra dynamics".


No rush.

Maybe I need a different variable, because that’s supposed to be y0 everywhere, not y_0 as it’s a different variable than y, not the indexing of y.


Got it.

It does look like an error in the pdf. That might distract some people.
Perhaps there’s an alternative notation?

Can one use a text label? Like $\text{initial}$?


Great case study, I submitted a pull request with some proposed slight changes in the code.
Will try out the exercise.


This would have been great when I started my PhD, 6 or so years ago! I’ll have a read through and see if I have any comments.


@lionel68: Thanks! I’ll take a look.


@ariddell: I agree and will fix the text and the programs so they all match.


Hi, this is a great case study for people like me (fitting simulation models to data).

I notice you used the stan_model() and sampling() functions, maybe you explain the reason you do it like this instead of just using stan()?

I also see you vectorised the likelihood, so that is good (unlike the soil carbon model). Would this make a big speed difference? In fact I am interested in how you would improve MCMC speed (assuming the model itself was already optimised)

I would also be interested to see how you would do it if there was missing data points (most data sets are not complete like this one) - can you still vectorise the likelihood and is it still efficient to do so?

I am also interested in generated quantities (e.g. RMSE, autocorrelation of residuals), how would you calculate these in the Stan code? In my models I usually have a lot of generated quantities for diagnosis and prediction and I am worried these slow down the MCMC, but I prefer to keep them with the model in Stan rather than recreate the model in R to calculate them.

So these are points that might make the example even more useful for beginners.




Probably not as I would suspect speed to be dominated by the differential equation solver and derivatives.

I’ll try to add that. I keep forgetting to get this one finished and publish it.

Yes, you can still vectorize the likelihood with missing data points. It’s the solver that needs irregular series of times at which it needs solutions (you basically just fit to the times at which you have data then model those—the solutions automatically impute the missing data point expectations and further rngs through the error model would simulate the missing data points).

The generated quantities block evaluation happens after the MCMC step, so it only gets evaluated once with no derivatives. It’s probably more efficient to do in Stan than in R. We’re about to expose a way to rerun generated quantities blocks after the fact. It’s plumbed through the C++ infrastructure and just needs to get into the interfaces.

If you have true values, you could compute RMSE directly in Stan. Similarly with autocorrelation of residuals. But we don’t have good autocorrelation calculation functions in Stan yet, so that’d probably be easier on the outside.


Thanks for your reply, Bob! That was very insightful for me.

The models I have are usually forced by external factors such as rainfall, the values of which are presented as a daily total. For this reason the models are usually constructed as daily time step difference equations, the derivatives are not smooth (or even continuous), and we do not use an ODE solver. Nevertheless your Lotka-Volterra example is very useful.

Warm regards.