Coding a hierarchial ODE model with more than one state -- best practices

Hello all,

My goal is to fit an ODE model with 3 state variables to many days of hourly data. I have been working in R and using an L-BFGS-B algorithm with the “optim” function, but it occurred to me that a hierarchical MCMC approach would be better. The idea is that each day can have varying parameters (ideally smoothly varying) as shown with this nice example (in Hierarchical Models section at the end): Fitting differential equation models with Stan. Each day also has two time series of forcing inputs (light and temperature).

I found this paper that seems to be a good analogy of what I want to do (Supplement with Stan code here):

However, it appears to be set up quite differently than what is shown here in Section 4.3, for example: Differential Equations Based Models in Stan

I am entirely new to writing code in Stan. I’ve read the intro docs about the blocks and what goes in them, but my survey of examples indicates that there is more than one way to go about coding this. One of my main hangups is how to code it hierarchically. Do I use matrix format in the parameter block like matrix[T,N] or do I use for loops to loop through each day?

Would someone please advise me how to best code this? I’ve attached my initial effort below. Do I put priors in the data block or the model block…or both? I’ve seen both.

functions {
  real[] myODE(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
    real dydt[3]; // Number of state variables

    // Unpack parameters and states
    real VAR1 = y[1];
    real VAR2 = y[2];
    real VAR3 = y[3];
    real A = theta[1];
    real B = theta[2];
    real C = theta[3];

    real temp_forcing = x_i[1] == 1; // not actually sure how to implement forcing...
    real light_forcing = x_i[2] == 1;
    // Define other constants and variables
    // ... (continue defining other constants and variables)
    // Calculate derivatives
    dydt[1] = ((A - B + fVAR1 + LVAR1) / z); // Derivative of VAR1
    dydt[2] = ((-A + B + fVAR2 + LVAR2) / z); // Derivative of VAR2
    dydt[3] = 2 * C + LVAR3 / z; // Derivative of VAR3
    return dydt;

data {
  int<lower=1> T;         // Number of time points per day
  int<lower=1> N;         // Number of days (individuals/groups)
  real<lower=0> timestep; // length of each timestep in days
  int<lower=1> n;         // number of observations per day
  matrix[T, N] y_VAR1;      // Observed VAR1 (T rows for hours, N columns for days)
  matrix[T, N] y_VAR2;     // Observed VAR2 (T rows for hours, N columns for days)
  matrix[T, N] y_VAR3;     // Observed VAR2 (T rows for hours, N columns for days)
  vector[T] light;        // Time-varying light data
  vector[T] temp;        // Time-varying temp data
  real y_init[3];         // Initial values for states
  int<lower=0, upper=1> temp_forcing[N];  // Binary indicator for temperature forcing
  int<lower=0, upper=1> light_forcing[N]; // Binary indicator for light forcing
  // ... (other data variables)

parameters {
  real<lower=0> theta[1]; // parameter(s)
  real<lower=0> theta[2]; // parameter(s)
  real<lower=0> theta[3]; // parameter(s)
  \\ or something like matrix[T, N] theta[1] ??
  real<lower=0> sigma;    // Measurement error

model {
  real y_hat[N, 3]; // Predicted states
  real y_init_local[3] = y_init; // Initialize states with provided initial values
  // Loop through time points
  for (n in 1:N) {
    int x_i[2];
    x_i[1] = temp_forcing[n];
    x_i[2] = light_forcing[n];
    y_hat[n] = integrate_ode_rk45(myODE, y_init_local, 0, {theta}, t[n], x_i, x_i);
    // Likelihood
    y[n, 1] ~ normal(y_hat[n, 1], sigma); // VAR1
    y[n, 2] ~ normal(y_hat[n, 2], sigma); // VAR2
    y[n, 3] ~ normal(y_hat[n, 3], sigma); // VAR3
    // Update initial values for the next time step
    y_init_local = y_hat[n];

Welcome to the Stan community! I’m fairly new to ODEs in Stan too, so I’ll let others weigh in on current best practices. Two items I can offer help:

One reason for the major differences between the examples you’ve found is that ODE’s have been a rather active area of development in recent years. Your code and the 2022 paper you reference uses the deprecated integrate_ode_* forms of the solvers. The types allowed for the solver arguments have changed a bit, so you might compare the current solver documentation against the their deprecated counterparts to extend the lifespan of your code from the beginning.

I like to have the data block ingest parameters for the prior distributions that are used in a sampling statement in the data block. I don’t think the data block can handle a ~{distribution}(...) sampling statement. Alternatively, you can hard-code the prior parameters in the model block. There would be nothing in the data block that relates to the priors, but then doing a prior sensitivity analysis gets very cumbersome (edit/duplicate the .stan, recompile, etc.). The strategy I use would look something like this:

data {
    // Prior distribution parameters
    real mu_theta[3];
    real<lower=0> sd_theta[3];
    real<lower=0> sd_sigma;
model {
    // Prior distributions
    theta ~ normal(mu_theta, sd_theta);  // I think the vectorization will handle all 3 here.
                                         // If not, write a sampling statement for each (or more generally, a loop) 
    sigma ~ normal(0, sd_sigma);
1 Like


I think I’ve got some working code, and wanted to leave it here in case anyone wants to provide improvements or wants to use something like it in the future. I’d love to be able to vectorize it a bit more to improve performance, so any advice is greatly appreciated. I went with a trapezoid integraiton approach because there are forcing variables which are not smooth functions, which are not suitable for the ODE solver (ode_rk45). In R you can have “events”, but that doesn’t appear to be possible yet in Stan.

I’m still not sure I’ve best coded the hyperpriors and priors – advice is appreciated. Actually, one of the hyperpriors I want to be a lognormal distribution centered on values less than 1, but putting a negative “mu” in the lognormal(mu, sigma) causes errors…I guess there needs to be some sort of transformation at some point? Thoughts?

functions {
    // Function that calculates the derivatives of the ODE
    vector myODE(real t, vector y, vector theta, real force1, real force2, vector dvals) {
      // Unpack states
      real VAR1  = y[1];
      real VAR2 = y[2];
      real VAR3 = y[3];
      // Unpack parameters
      real A    = theta[1];
      real B    = theta[2];
      real C    = theta[3];
      real D    = theta[4];
      vector[3] dydt; // Number of state variables
      dydt[1] = A * VAR1 * force1;
      dydt[2] = VAR1 - B * VAR2 * force2;
      dydt[3] = C * VAR2;
      return dydt;
    // Function that performs the trapezoid rule
    vector trapezoid_rule(real dt, real n_steps, vector y0, vector theta, real force1, real force2) {
      vector[3] y = y0;
      for (i in 1:n_steps) {
        vector[3] k1 = myODE((i - 1) * dt, y, theta, force1, force2);
        vector[3] k2 = myODE(i * dt, y + dt * k1, theta, force1, force2);
        y += 0.5 * dt * (k1 + k2);
      return y;
    // Function that gets some non-state variables
    vector var_fun(real force2, real VAR3, real VAR2) {
      vector[2] result = force2 * VAR3 * VAR2;
      return result;

data {
  int<lower=1> n_days;  // Number of days
  int<lower=1> n_obs;  // Number of observations per day
  int<lower=1> n_steps;  // Number of steps for trapezoid rule
  real<lower=0> dt;  // Time step for observations
  array[n_obs, n_days] real VAR1_obs;  // Observed VAR1 data
  array[n_obs, n_days] real var_obs;  // Observed data for non-state variable
  array[n_obs, n_days] real force1;  // Observed forcing 1
  array[n_obs, n_days] real force2;  // Observed forcing 2
  array[n_days] vector[3] y0_init;  // Initial states for each day, known

parameters {
  // Parameters to estimate with their bounds
  vector<lower=0>[n_days] A;
  vector<lower=0>[n_days] B;
  vector<lower=0>[n_days] C;
  vector<lower=gpp_mean/2>[n_days] D; //D is at least half of C
  real<lower=0> sigma_y;  // Observation error standard deviation
  vector<lower=0>[9] mu_theta;

transformed parameters {
  matrix[n_obs, n_days] VAR1_hat; // we always have VAR1 observations to test against
  matrix[n_obs, n_days] var_hat; // and always var observations to test against
  vector[9] theta;
  for (d in 1:n_days) {
    theta[1] = A[d];
    theta[2] = B[d];
    theta[3] = C[d];
    theta[4] = D[d];
    for (i in 1:n_obs) {
      real t = (i - 1) * dt;
      vector[3] y_hat = trapezoid_rule(dt, n_steps, y0_init[d], theta, force1[i, d], force2[i, d]);
      VAR1_hat[i, d] = y_hat[1];
      // Compute var using the var_fun function for each day and time
      vector[2] vars_hat = var_fun(force2[i, d], y_hat[3], y_hat[2]);
      var_hat[i, d] = vars_hat[2];

model {
  // Hyperprior distributions of parameters across days
  mu_theta[1] ~ normal(0.05, 0.02); // A, covers a good range
  mu_theta[2] ~ normal(0.5, 0.3); // B, covers a good range
  mu_theta[3] ~ normal(1, 0.3); // C, covers a good range
  mu_theta[4] ~ normal(0.1, 0.05); // D, covers a good range
  A ~ normal(mu_theta[1], 1);
  B ~ normal(mu_theta[2], 1);
  C ~ normal(mu_theta[3], 1);
  D ~ normal(mu_theta[4], 1);
  // Priors for observation error
  sigma_y ~ cauchy(0, 1);
  // Likelihood for observed data
  for (d in 1:n_days) {
    for (i in 1:n_obs) {
      real t = (i - 1) * dt;
      vector[3] y_pred = trapezoid_rule(t, n_steps, y0_init[d], theta, force1[i, d], force2[i, d]);
      VAR1_obs[i, d] ~ normal(y_pred[1], sigma_y);
      // Compute CVAR1 using the carb function for each day and time
      vector[4] var_pred = var_fun(force2[i, d], y_pred[3], y_pred[2]);
      var_obs[i, d] ~ normal(var_pred[2], sigma_y);