How to fit a SIR model with four transmission rates at four time periods

Dear all,

I want to fit a SIR model with four transmission rates at four time periods, I tried to write the code, but it seems that the function integrate_ode_rk45 cannot deal with the function ‘sir’ because the type of ‘sir’ is real[,]. Could anyone help for correcting and improving the code. Thanks in advance. Pls see the code below.

functions {
  real[,] sir(real ts, int n_days, int[] t_thre, real[,] y, real[] theta, real[] x_r, int[] x_i) {

real S[n_days] = y[1];
real I[n_days] = y[2];
real R[n_days] = y[3];

real N = x_i[1];

real beta1 = theta[1];
real beta2 = theta[2];
real beta3 = theta[3];
real beta4 = theta[4];
real gamma = theta[5];

int t_thre1 = t_thre[1];
int t_thre2 = t_thre[2];
int t_thre3 = t_thre[3];
int t_thre4 = t_thre[4];

real dS_dt[n_days];
real dI_dt[n_days];
real dR_dt[n_days];

for(i in 1:(t_thre1-1)){
  dS_dt[i] = -beta1 * I[i] * S[i] / N;
  dI_dt[i] =  beta1 * I[i] * S[i] / N - gamma * I[i];
  dR_dt[i] =  gamma * I[i];
}
for(i in t_thre1:(t_thre2-1)){
  dS_dt[i] = -beta2 * I[i] * S[i] / N;
  dI_dt[i] =  beta2 * I[i] * S[i] / N - gamma * I[i];
  dR_dt[i] =  gamma * I[i];
}
for(i in t_thre2:(t_thre3-1)){
  dS_dt[i] = -beta3 * I[i] * S[i] / N;
  dI_dt[i] =  beta3 * I[i] * S[i] / N - gamma * I[i];
  dR_dt[i] =  gamma * I[i];
}
for(i in t_thre3:n_days){
  dS_dt[i] = -beta4 * I[i] * S[i] / N;
  dI_dt[i] =  beta4 * I[i] * S[i] / N - gamma * I[i];
  dR_dt[i] =  gamma * I[i];
}

return {dS_dt, dI_dt, dR_dt};
}
}

data {
int<lower=1> n_days;
real y0[3];
real t0;
real ts[n_days];
int N;
int cases[n_days];
}
transformed data {
real x_r[0];
int x_i[1] = { N };
}
parameters {
real<lower=0> gamma;
real<lower=0> beta1;
real<lower=0> beta2;
real<lower=0> beta3;
real<lower=0> beta4;
real<lower=0> phi_inv;
}
transformed parameters{
real y[n_days, 3];
real phi = 1. / phi_inv;
{
  real theta[5];
  theta[1] = beta1;
  theta[2] = beta2;
  theta[3] = beta3;
  theta[4] = beta4;
  theta[5] = gamma;
  
  y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
}
}
model {
//priors
beta1 ~ normal(2, 1); // how to determine the prior?
beta2 ~ normal(2, 1);
beta3 ~ normal(2, 1);
beta4 ~ normal(2, 1);
gamma ~ normal(0.4, 0.5);
phi_inv ~ exponential(5);

//sampling distribution
//col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);
}
generated quantities {
real recovery_time = 1 / gamma;
real pred_cases[n_days];
real S[n_days];
pred_cases = neg_binomial_2_rng(col(to_matrix(y), 2), phi);
S = col(to_matrix(y), 1);
}

The return type of the ODE right hand side needs to be one dimensional.

So the system that is being integrated is:

y' = f

If your state is actually a concatenation of other states y = c(y1, y2, y3, y4), then you need to make y a really long 1d array, not a 2d array.

Is there an interaction between these four SIR models? If there is no interaction, then you’d want to solve them separately.

(Edit: I replaced ‘matrix’ with ‘2d array’ cause they’re different in Stan)

Thanks Ben! The four SIR models are for four consecutive time, with each transmission rate for one time period.

Oh, well if it’s four separate ODEs, solve them separately! There’s no advantage to trying to make them one big solve, and you can have as many ODE integrator calls as you want in your Stan program.

Also I think you have too many arguments to your ODE function. There’s a different interface that has more flexibility described here: https://mc-stan.org/users/documentation/case-studies/convert_odes.html, but integrate_ode_rk45 is strict that it needs the signature:

real[] ode(real time, real[] state, real[] theta,
         real[] x_r, int[] x_i)

Docs are here: https://mc-stan.org/docs/2_24/functions-reference/functions-old-ode-solver.html

Thanks Ben! The final states of SIR (i.e. S, I, R) of one period are the initial states of the following period. Do you mean that we can firstly estimate the states and parameters in one period and then use the result of the period to estimate the states and parameters for the next period?

When I hear estimate I’m thinking the posterior of the whole model. What I’m talking about is chaining together ODE calls – not breaking the model up into multiple little models.

Like this:

yT1 = ode_rk45(sho, y0, t0, ts1, theta);
yT2 = ode_rk45(ode_fun, yT1[size(ts1)], ts1[size(ts1)], ts2, theta);
yT3 = ode_rk45(ode_fun, yT2[size(ts2)], ts2[size(ts2)], ts3, theta);
...

In this case I’d call yT1 something like an intermediate calculation, but this should work for sure. (also I used the new ODE syntax there (example) that’s only in cmdstanr/cmdstanpy/cmdstan currently).

yT1[size(ts1)] means take the last state of the last solve as the initial state of this solve, and ts1[size(ts1)] says take the last time of the last solve as the initial time of this solve.

Is that the thing you want to do?

Yes, I think it is what I want. Thanks again!

I tried to write the code, but there is an error: Variable “sir” does not exist. Could you help take a look at it and give some suggestions?

stan code:

functions {

vector sir(real ts,
vector y,
real beta,
real gamma,
int N) {
vector[4] dydt;

dydt[1] = -beta * y[2] * y[1] / N;
dydt[2] = beta * y[2] * y[1] / N - gamma * y[2];
dydt[3] = gamma * y[2];

return dydt;
}
}

data {
int<lower=1> n_days;
int<lower=1> n_ts1;
int<lower=1> n_ts2;
int<lower=1> n_ts3;
real y0[3];
real t0;
real ts[n_days];
int N;
int cases[n_days];
real ts1[n_ts1]; // ts1: the time points of the first time period
// n_ts1: the number of time points of ts1
real ts2[n_ts2]; // ts2: the time points of the first time period
// n_ts2: the number of time points of ts2
real ts3[n_ts3]; // ts3: the time points of the first time period
// n_ts3: the number of time points of ts3
}

parameters {
real<lower=0> gamma1;
real<lower=0> beta1;
real<lower=0> gamma2;
real<lower=0> beta2;
real<lower=0> gamma3;
real<lower=0> beta3;
real<lower=0> phi_inv;
}
transformed parameters{
vector<lower=0>[3] yT1[n_ts1];
vector<lower=0>[3] yT2[n_ts2];
vector<lower=0>[3] yT3[n_ts3];
real phi = 1. / phi_inv;
yT1 = ode_rk45(sir, y0, t0, ts1, beta1, gamma1, N);
yT2 = ode_rk45(sir, yT1[n_ts1], ts1[n_ts1], ts2, beta2, gamma2, N);
yT3 = ode_rk45(sir, yT2[n_ts2], ts1[n_ts2], ts3, beta3, gamma3, N);
}
model {
//priors
beta1 ~ normal(2, 1);
gamma1 ~ normal(0.4, 0.5);
beta2 ~ normal(2, 1);
gamma2 ~ normal(0.4, 0.5);
beta3 ~ normal(2, 1);
gamma3 ~ normal(0.4, 0.5);
phi_inv ~ exponential(5);

//sampling distribution
//col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people
cases ~ neg_binomial_2((col(to_matrix(yT1), 2),col(to_matrix(yT2), 2),col(to_matrix(yT3), 2)), phi);
}
generated quantities {
real R01 = beta1 / gamma1;
real R02 = beta2 / gamma2;
real R03 = beta3 / gamma3;
real recovery_time = 1 / gamma;
real pred_cases[n_days];
vector S[n_days];
pred_cases = neg_binomial_2_rng([col(to_matrix(yT1), 2),col(to_matrix(yT2), 2),col(to_matrix(yT3), 2)], phi);
S=[ col(to_matrix(yT1), 2),col(to_matrix(yT2), 2),col(to_matrix(yT3), 2) ]’;
}

Change:

real y0[3];

to

vector[3] y0;

That was a difficult error to debug I’ll make an issue and see if there’s a better compiler error we can spit out in the future.

Also in the likelihood term

cases ~ neg_binomial_2((col(to_matrix(yT1), 2),col(to_matrix(yT2), 2),col(to_matrix(yT3), 2)), phi);

And the generated quantities:

pred_cases = neg_binomial_2_rng([col(to_matrix(yT1), 2),col(to_matrix(yT2), 2),col(to_matrix(yT3), 2)], phi);

Do yT1[, 2] instead of col(to_matrix(yT1), 2) imo, easier to read. I think there’s a syntax problem with the likelihood term too, so probably comment these out until the ODE compiles then add them back in and work through those errors.

I’m assuming four transmission rates are intended to model different stages of a disease and/or prevention strategy, in this case does it make sense for them to have same prior? Can we incorporate some general trend such as ordered vector or switching conditioned on SIR solution?

Thanks Ben! I have corrected the code but there is still error. Do you have any suggestions?

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable “sir” does not exist.
error in ‘model7ab56bdb1f_sir_negbin_categorical_beta’ at line 50, column 18

48: vector<lower=0>[3] yT3[n_ts3];
49: real phi = 1. / phi_inv;
50: yT1 = ode_rk45(sir, y0,         t0,         ts1, beta1, gamma1, N);
                     ^
51: yT2 = ode_rk45(sir, yT1[n_ts1], ts1[n_ts1], ts2, beta2, gamma2, N);

Stan code:

functions {

vector sir(real ts,
vector y,
real beta,
real gamma,
int N) {
vector[3] dydt;

dydt[1] = -beta * y[2] * y[1] / N;
dydt[2] = beta * y[2] * y[1] / N - gamma * y[2];
dydt[3] = gamma * y[2];

return dydt;
}
}

data {
int<lower=1> n_days;
int<lower=1> n_ts1;
int<lower=1> n_ts2;
int<lower=1> n_ts3;
vector[3] y0;
real t0;
real ts[n_days];
int N;
int cases[n_days];
real ts1[n_ts1]; // ts1: the time points of the first time period
// n_ts1: the number of time points of ts1
real ts2[n_ts2]; // ts2: the time points of the first time period
// n_ts2: the number of time points of ts2
real ts3[n_ts3]; // ts3: the time points of the first time period
// n_ts3: the number of time points of ts3
}

parameters {
real<lower=0> gamma1;
real<lower=0> beta1;
real<lower=0> gamma2;
real<lower=0> beta2;
real<lower=0> gamma3;
real<lower=0> beta3;
real<lower=0> phi_inv;
}
transformed parameters{
vector<lower=0>[3] yT1[n_ts1];
vector<lower=0>[3] yT2[n_ts2];
vector<lower=0>[3] yT3[n_ts3];
real phi = 1. / phi_inv;
yT1 = ode_rk45(sir, y0, t0, ts1, beta1, gamma1, N);
yT2 = ode_rk45(sir, yT1[n_ts1], ts1[n_ts1], ts2, beta2, gamma2, N);
yT3 = ode_rk45(sir, yT2[n_ts2], ts1[n_ts2], ts3, beta3, gamma3, N);
}
model {
//priors
beta1 ~ normal(2, 1);
gamma1 ~ normal(0.4, 0.5);
beta2 ~ normal(2, 1);
gamma2 ~ normal(0.4, 0.5);
beta3 ~ normal(2, 1);
gamma3 ~ normal(0.4, 0.5);
phi_inv ~ exponential(5);

//sampling distribution
//col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people
cases ~ neg_binomial_2((yT1[,2],yT2[,2],yT3[,2]), phi);
}
generated quantities {
real R01 = beta1 / gamma1;
real R02 = beta2 / gamma2;
real R03 = beta3 / gamma3;
real recovery_time = 1 / gamma;
real pred_cases[n_days];
vector S[n_days];
pred_cases = neg_binomial_2_rng([yT1[,2],yT2[,2],yT3[,2]], phi);
S=[ yT1[,2],yT2[,2],yT3[,2] ]’;
}

Thanks for your suggestions! I agree with you that having some constrains would be better.

That error comes from using rstan to build the model instead of cmdstanr (Function reference • cmdstanr). Current rstan doesn’t have the ode_rk45 function but cmdstanr does.

Once you have cmdstanr installed, just work through the errors one by one. I was able to get this model to compile by commenting out stuff that errored or modifying it a bit (though I don’t know if it’s what you want to do).

I think give up on the vectorized notation for now. I was trying a shorthand and it’s just too confusing. Write what you want in a for loop, and if there’s a vectorized version once you get it working you can go back to that.

Like:

for(n in 1:n_ts1) {
  cases[n] ~ neg_binomial_2(..., phi);
}

If you write it like that it will be easier to debug in this case I think.

And replace S=[ yT1[,2],yT2[,2],yT3[,2] ]’; with the non-vectorized thing.

real recovery_time = 1 / gamma;, gamma doesn’t exist

Thank your Ben for your kind reply! It helps a lot.