Estimating parameters of a SEIR model

Thanks a lot I must check this.

Start by specifying nchains=1 as instructed in the error message. This should give you more informative errors.

So my latest update is as follow:

I am following the link shared by @bbbales2. My data contains daily number of confirmed cases of covid-19 in the UK. My stan code is as follow:

 functions {
  
  real[] sir(real t,
            real[] y,
            real[] theta,
            real[] x_r,
            int[] x_i) {
      
      // I renamed my ODEs variables here: 
      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      real N = x_i[1];
      
      // Here are the list of my parameters
      real phi = theta[1];
      real pe = theta[2];
      real pie = theta[3];
      real pee = theta[4]; 
      real beta = theta[5];
      real mu = theta[6]; 
      real betam = theta[7]; 
      real tau = theta[8]; 
      real alpha = theta[9]; 
      real Ts = theta[10]; 
      real Ta = theta[11]; 
      real Tm = theta[12]; 
      real Ti = theta[13]; 

     // here are the ODEs. I notice that in the link they define them as real dS_dt rather than dydt[1] 
      real dS_dt = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
      real dE_dt = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
      real dG_dt = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
      real dP_dt = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
      real dI_dt = alpha * E / tau - I / Ts;
      real dA_dt = (1 - alpha) * E / tau - A / Ta;
      real dB_dt = G / tau - B / Tm;
      real dC_dt = P / tau - C / Ti;  
      
      return {dS_dt, dE_dt, dG_dt, dP_dt, dI_dt, dA_dt, dB_dt, dC_dt};
    }
  
}

data {
  int<lower=1> n_days;
  real y0[8];
  real t0;
  real ts[n_days];
  int N;
  int cases[n_days];
  
}

transformed data {
  real x_r[0];
  int x_i[1] = { N };
}

parameters {
 real<lower = 0, upper = 1> phi;
 real<lower = 0, upper = 1> pe;
 real<lower = 0, upper = 1> pie;
 real<lower = 0, upper = 1> pee;
 real<lower = 0> beta;
 real<lower = 0> mu;
 real<lower = 0>betam;
 real<lower = 0> tau;
 real<lower = 0> alpha;
 real<lower = 0> Ts;
 real<lower = 0> Ta;
 real<lower = 0> Tm;
 real<lower = 0> Ti;
}

transformed parameters{
  real y_hat[n_days, 8];
  real<lower=0, upper=1> theta1[4];
  real<lower = 0> theta2[9]; 
  
  y_hat = integrate_ode_rk45(sir, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
  for(i in 1:n_days){
    for(j in 1:8){
       y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
    }
  }
  
}

model {
  phi ~ normal(0, 1);  //phi
  pe ~ normal(0, 1); //pe
  pie ~ normal(0, 1); //pie
  pee ~ normal(0, 1); //pee
  
  beta ~ normal(0.5, 0.9); //beta
  mu ~ normal(0.2, 0.55); //mu
  betam ~ normal(0, 1);  //beta_m
  tau ~ normal(1, 20); //tau
  alpha ~ normal(0.1, 0.99); //alpha
  Ts ~ normal(3, 20); //Ts
  Ta ~ normal(1, 20); //Ta
  Tm ~ normal(1, 20); //Tm
  Ti ~ normal(0.5, 8); //Ti
  
  cases ~ binomial(col(to_matrix(y_hat, 5))); //y_hat[,5] are the fractions infected from the ODE solver
  
}

generated quantities {
  real R_0; 
  R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
  
}

But I can not run this again, as it says:

Error in close.connection(zz) : cannot close 'message' sink connection

when compiling the stan file. My R code is here along with other files. I wonder what am I doing wrong? any help is much appreciated. Could this be a problem at model block? I am defining the likelihood as binomial for infections which is the 5th ODE.

I also wonder why am I getting error as above only? while others seem to get pinpointed on where the error is happening?

mn.stan (3.0 KB) rcode1.R (982 Bytes) xyzz.csv (1.8 KB)

There seems to be a syntax error on this line:

cases ~ binomial(col(to_matrix(y_hat, 5)));

I changed the likelihood to a normal (just as a placeholder to get things to compile), and then I got these errors when I ran the model:

[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                
[2] "  Exception: Max number of iterations exceeded (1000000).  (in 'modelda7739d7107_mn' at line 87)"
error occurred during calling the sampler; sampling not done
here are whatever error messages were returned

Which indicate that the sampler wasn’t able to initialize itself cause the ODE solver failed. Are you getting these errors as well?

Thank you @bbbales2. So I guess I know where the error could happen. Let me go through it step by step. First unfortunately the only error I get is the
Error in close.connection(zz) : cannot close 'message' sink connection
and I assume this is because I have Catalina and R version 3.6.3 and when I try to update this to R version 4 I encounter many other issues, to begin on the updated version I can not get rstan running. This is why unfortunately I am using the older version.

Having said that if you look at the link you’ve shared the way they define the likelihood is as follow:

cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);

this is negative binomial. Yet what’s important there is a parameter phi involved which I assume it takes the noise into account. I didn’t have that and I am not sure if I must have this in defining likelihood.

Secondly I think this is equivalent of my first stan code which got corrected by @maxbiostat, in there we have:

y ~ binomial(n_sample, y_hat[,5]);

again there is a parameter n_sample which was defined in data block.

In the first version of stan code corrected by @maxbiostat I was running:

test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

 mod = stan(fit = test,
           data = stan_d,
           pars = params_monitor,
           chains = 3,
           warmup = 500,
           iter = 1000,
           control = list(adapt_delta = 0.99))

on R and I was getting the error:

no parameter params; sampling not done
Stan model 'SI_fit' does not contain samples.

So I guess the issue boils down to n_sample or adding a parameter for the noise. I am not sure how to do this right, any suggestions on this would be kind? Note that in above I defined n_sample as the UK population size and it did not work

Also note that there is a typo, following the link you’ve shared it should be:

cases ~ binomial(col(to_matrix(y_hat), 5));

Oof, yeah, I think this is worth a separate thread. I do not know the Catalina/R issues (Linux here).

Usually the question with binary data is is your data the outcomes of N trials? Or is it the count of something occurring at some rate?

That sorta leads the discussion of what likelihood we’ll pick. The first we’ll usually go bernoulli/binomial. The second looks like a poisson, but usually when you have a poisson you can do better by just fitting a negative binomial (because reality wasn’t actually exactly poisson or whatever).

Thank you @bbbales2 for your reply. So my data is a vector as follow:

0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 5, 0, 1, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 2, 5, 3, 13, 4, 11, 34, 30, \
48, 43, 67, 48, 61, 74, 0, 342, 342, 0, 403, 407, 676, 63, 1294, \
1035, 665, 967, 1427, 1452, 2129, 2885, 2546, 2433, 2619, 3009, 4324, \
4244, 4450, 3735, 5903, 3802, 3634, 5491, 4344, 8681, 5233, 5288, \
4342, 5252, 4603, 4617, 5599, 5525, 5850, 4676, 4301, 4451, 4583, \
5386, 4913, 4463, 4309, 3996, 4076, 6032, 6201, 4806, 4339, 3985, \
4406, 6111, 5614, 4649, 3896, 3923, 3877, 3403, 3242, 3446, 3560, 3450 

each entry is daily number of infected people to Covid-19. So there is no trial per day as each day has one number. This would be bell shaped which can be described by binomial or poission indeed. So I imagine one would not need n_sample is that so?

and to make it complete, this data is the number of infections as I mentioned and so the 5th ODE is the one we need to fit them into.

The bell shape here is on the time axis.

The likelihood is for describing each outcome. So you have a model for the number of underlying infections (I[t]) and you have you measurements (y[t]).

Your outcomes aren’t trials, so you might say that you expect your measurements to be Poisson distributed about I[t]:

y[t] ~ poisson(I[t]);

Usually this model doesn’t fit super well. Like, if you take your data, fit it, and then check the fit vs. the data like in the case study it won’t be great.

So then move to a negative binomial. In that case you also need to fit an overdispersion parameter. That parameter might seem a bit weird, so just generate data in R to get a handle on what it does:

Here’s the R docs and the stan docs.

I think rnbinom(n, size = phi, mu = mu) is the parameterization you want to convert from Stan parameters (mu and phi) to R parameters (mu and size). Might want to double check that though.

Have a look at the references in the case study and also the references Riou 2020 and Flaxman 2020. Look at how they do things. As I understand it there’s a lot going on in covid-19 data analysis that goes beyond picking a likelihood and an SEIR model – but you can probably get a better sense of what’s going on by looking at those papers and seeing how they do this.

1 Like

Thank you @bbbales2 for your guidance. My issue at the moment is that I can not get the stan run for basic binomial. Let me go through it part by part so maybe you or someone could help me:

First of all I have my function, there are 8 ODEs and the 5th one is the infection which I want to fit my data into. For this I have:

functions {
  
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[8];

      // Variables of the ODEs

      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      
      // The parameters of the ODEs
      real phi = params[1];
      real pe = params[2];
      real pie = params[3];
      real pee = params[4]; 
  
      real beta = params[5];
      real mu = params[6]; 
      real betam = params[7]; 
      real tau = params[8]; 
      real alpha = params[9]; 
      real Ts = params[10]; 
      real Ta = params[11]; 
      real Tm = params[12]; 
      real Ti = params[13]; 
      
      dydt[1] = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
      dydt[2] = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
      dydt[3] = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
      dydt[4] = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
      dydt[5] = alpha * E / tau - I / Ts;
      dydt[6] = (1 - alpha) * E / tau - A / Ta;
      dydt[7] = G / tau - B / Tm;
      dydt[8] = P / tau - C / Ti;
      return dydt;
    } 

In this block I have defined 13 parameters of the ODEs with first 4 being probabilities. Then the data block is:

data {
  int<lower = 1> n_obs; // Number of days sampled
  int<lower = 1> n_params; // Number of model parameters
  int<lower = 1> n_difeq; // Number of differential equations in the system
  int y[n_obs]; // The binomially distributed data
  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled
}

I have labelled them so one can understand what is what easily. My transformed data is the usual:

transformed data {
  real x_r[0];
  int x_i[0];
}

then I have my parameters:

parameters {
 real<lower = 0, upper = 1> phi;
 real<lower = 0, upper = 1> pe;
 real<lower = 0, upper = 1> pie;
 real<lower = 0, upper = 1> pee;
 real<lower = 0> beta;
 real<lower = 0> mu;
 real<lower = 0>betam;
 real<lower = 0> tau;
 real<lower = 0> alpha;
 real<lower = 0> Ts;
 real<lower = 0> Ta;
 real<lower = 0> Tm;
 real<lower = 0> Ti;
 real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}

In this block you can see that S0 is between 0 and 1, this is because I am assuming that total population is equal to 1. So at the end my data plot would show the “portion of people who are infected”. I then have my transformed data :

transformed parameters{
  real y_hat[n_obs, n_difeq]; // Output from the ODE solver
  real y0[n_difeq]; // Initial conditions
  real<lower=0, upper=1> params1[4];
  real<lower = 0> params2[9]; 

  y0[1] = 1 - 0.02 - 0.001;
  y0[2] = 0.02;
  y0[3] = 0;
  y0[4] = 0;
  y0[5] = 0.001;
  y0[6] = 0;
  y0[7] = 0;
  y0[8] = 0;
  y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
    }
  }
  
}

As you can see from the initial conditions they are all between 0 and 1 and susceptible individuals are about 0.98 of the population (which is 1). Another note is that I have real<lower=0, upper=1> params1[4]; for the probabilities and real<lower = 0> params2[9]; for the rest of the parameters. The integrator then keeps track of list of the parameters (this is a neat way suggested by @maxbiostat ).

My model finally is:

model {
  phi ~ normal(0, 1);  //phi
  pe ~ normal(0, 1); //pe
  pie ~ normal(0, 1); //pie
  pee ~ normal(0, 1); //pee
  
  beta ~ normal(0.5, 0.9); //beta
  mu ~ normal(0.2, 0.55); //mu
  betam ~ normal(0, 1);  //beta_m
  tau ~ normal(1, 20); //tau
  alpha ~ normal(0.1, 0.99); //alpha
  Ts ~ normal(3, 20); //Ts
  Ta ~ normal(1, 20); //Ta
  Tm ~ normal(1, 20); //Tm
  Ti ~ normal(0.5, 8); //Ti
  S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
  
  y ~ binomial(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver
  
}

As you can see I defined y ~ binomial(y_hat[,5]) where y_hat[,5] are the infected solution of the ODEs. As you mentioned this might not be accurate in the fit, but I am just first trying to get it run, hence I am sharing my method with you.

Finally, on the stan side, I have the following generated quantity:

generated quantities {
  real R_0; 
  R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));  

On the R side I have the list of the variables needed to run the stan:

stan_d = list(n_obs = length(datalist$cases),
              n_params = 13,
              n_difeq = 8,
              y = datalist$cases/66500000,
              t0 = 0,
              ts = sample_time)

where n_obs is the number of dates or in my case the length of the cases. n_params is the number of parameters of the ODEs, n_difeq is the number of ODEs, y is the number of infected normalised by population so they would be between 0 and 1. and finally ts is the following:

sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)

I then run the following and test would fail,

# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")

# Test / debug the model:
test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

I am running this code on Rstudio and my full R code looks as follow (this includes the stan part):

# importing the libraries
library(tidyverse)
library(deSolve)
library(dplyr)
library(ggplot2)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
datalist <- read.csv(file = 'xyzz.csv')


# plotting the data
ggplot(data = datalist) + 
  geom_point(mapping = aes(x = date, y = cases)) + 
  labs(y = "Number of daily cases")



sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
                20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
                37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
                54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
                88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
                104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)




# The Stan model statement:
cat(
  '
functions {
  
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[8];

      // Variables of the ODEs

      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      
      // The parameters of the ODEs
      real phi = params[1];
      real pe = params[2];
      real pie = params[3];
      real pee = params[4]; 
  
      real beta = params[5];
      real mu = params[6]; 
      real betam = params[7]; 
      real tau = params[8]; 
      real alpha = params[9]; 
      real Ts = params[10]; 
      real Ta = params[11]; 
      real Tm = params[12]; 
      real Ti = params[13]; 
      
      dydt[1] = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
      dydt[2] = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
      dydt[3] = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
      dydt[4] = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
      dydt[5] = alpha * E / tau - I / Ts;
      dydt[6] = (1 - alpha) * E / tau - A / Ta;
      dydt[7] = G / tau - B / Tm;
      dydt[8] = P / tau - C / Ti;
      
      return dydt;
    }
  
}

data {
  int<lower = 1> n_obs; // Number of days sampled
  int<lower = 1> n_params; // Number of model parameters
  int<lower = 1> n_difeq; // Number of differential equations in the system 
  int y[n_obs]; // The binomially distributed data
  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled
  
}

transformed data {
  real x_r[0];
  int x_i[0];
}

parameters {
 real<lower = 0, upper = 1> phi;
 real<lower = 0, upper = 1> pe;
 real<lower = 0, upper = 1> pie;
 real<lower = 0, upper = 1> pee;
 real<lower = 0> beta;
 real<lower = 0> mu;
 real<lower = 0>betam;
 real<lower = 0> tau;
 real<lower = 0> alpha;
 real<lower = 0> Ts;
 real<lower = 0> Ta;
 real<lower = 0> Tm;
 real<lower = 0> Ti;
 real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}

transformed parameters{
  real y_hat[n_obs, n_difeq]; // Output from the ODE solver
  real y0[n_difeq]; // Initial conditions
  real<lower=0, upper=1> params1[4];
  real<lower = 0> params2[9]; 

  y0[1] = 1 - 0.02 - 0.001;
  y0[2] = 0.02;
  y0[3] = 0;
  y0[4] = 0;
  y0[5] = 0.001;
  y0[6] = 0;
  y0[7] = 0;
  y0[8] = 0;
  
  y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
  //for(i in 1:n_obs){
    //for(j in 1:n_difeq){
       //y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
    //}
 // }
  
}

model {
  phi ~ normal(0, 1);  //phi
  pe ~ normal(0, 1); //pe
  pie ~ normal(0, 1); //pie
  pee ~ normal(0, 1); //pee
  
  beta ~ normal(0.5, 0.9); //beta
  mu ~ normal(0.2, 0.55); //mu
  betam ~ normal(0, 1);  //beta_m
  tau ~ normal(1, 20); //tau
  alpha ~ normal(0.1, 0.99); //alpha
  Ts ~ normal(3, 20); //Ts
  Ta ~ normal(1, 20); //Ta
  Tm ~ normal(1, 20); //Tm
  Ti ~ normal(0.5, 8); //Ti
  S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
  
  y ~ binomial(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver
  
}

generated quantities {
  real R_0; 
  R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
  
}

', 
file = "SI_fit.stan", sep="", fill=T)

# FITTING

# For stan model we need the following variables:

stan_d = list(n_obs = length(datalist$cases),
              n_params = 13,
              n_difeq = 8,
              y = datalist$cases/66500000,
              t0 = 0,
              ts = sample_time)

# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")

# Test / debug the model:
test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

# Fit and sample from the posterior
mod = stan(fit = test,
           data = stan_d,
           pars = params_monitor,
           chains = 3,
           warmup = 500,
           iter = 1000,
           control = list(adapt_delta = 0.99))

# Extract the posterior samples to a structured list:
posts <- extract(mod)

I am adding the data file that I am using as well in here:
xyzz.csv (1.8 KB)
this is used in datalist <- read.csv(file = 'xyzz.csv'). Through the discussion that we had and as much as I could perfectly look into other models I think my stan code looks ok but I could have missed something with my newbie eyes or I am not sure if I am making any mistake on the R side where I define the stan_d = list() which are the required parameters for stan. It’s also noteworthy that I am using this reference for my code and indeed I can run the code perfectly fine from the link but I can not run mine unfortunately. I’ll be grateful if I eventually can understand where is my mistake happening.

1 Like

With the model you wrote, poisson is probably the simplest likelihood, so:

y ~ poisson(y_hat[,5]);

Would be the thing to try.

If you really want to use binomial, you’re missing an argument, and the units of y_hat[, 5] (number of infections), don’t make sense.

A binomial takes two parameters. n number of trials and p probability of successes, and it describes the distribution of the number of successes in those n trials.

1 Like

Thank you @bbbales2. So, for the time being, I changed the likelihood to Poisson and when I run it I get the following error:

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: int variable contained non-int values; processing stage=data initialization; variable name=y; base type=int  (in 'model70544a99286_SI_fit' at line 62)
 [origin: unknown original type]
failed to create the sampler; sampling not done
Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: int variable contained non-int values; processing stage=data initialization; variable name=y; base type=int  (in 'model70544a99286_SI_fit' at line 62)
 [origin: unknown original type]
failed to create the sampler; sampling not done
Stan model 'SI_fit' does not contain samples.

I checked this and I came across this post, I wonder if this error happens due to the way I defined t0 and ts in data block:

  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled

or the way I defined sample_time namely:

sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)

which I used for running the stan code in:

stan_d = list(n_obs = length(datalist$cases),
              n_params = 13,
              n_difeq = 8,
              y = datalist$cases/66500000,
              t0 = 0,
              ts = sample_time)

I would appreciate your guidance.

So the problem is with y. It is declared as int in the model, but apparently some of the input values are not ints.

1 Like

Indeed so in data block I have rightly so:

int y[n_obs]; 

and in

stan_d = list(n_obs = length(datalist$cases),
              n_params = 13,
              n_difeq = 8,
              y = datalist$cases/66500000,
              t0 = 0,
              ts = sample_time)

I have checked my y using:

typeof(datalist$cases/66500000)

for which I get double which is double precision. Indeed division by 66.5 million makes the numbers very small and close to 0 so when I do as.integer(datalist$cases/66500000) I have list of 0s. Is there a way to overcome this on stan side?

The problem is with the model formulation.

The Poisson and binomial are defined on integers. If you want to use non-integers as your data, you’ll need to use a different likelihood.

So there is a way to overcome this and that is in the level of ODEs, everywhere there is S we divide by N which is the population N was 1 now we put the real value for it so 66500000. I have made the following modifications:

On the R side I defined n <- 66500000

then on the stan side I have done the following on the function block:

functions {
  
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[8];

      // Variables of the ODEs

      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      



      
      // The parameters of the ODEs
      real phi = params[1];
      real pe = params[2];
      real pie = params[3];
      real pee = params[4]; 
  
      real beta = params[5];
      real mu = params[6]; 
      real betam = params[7]; 
      real tau = params[8]; 
      real alpha = params[9]; 
      real Ts = params[10]; 
      real Ta = params[11]; 
      real Tm = params[12]; 
      real Ti = params[13]; 
      
      
      dydt[1] = - (mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S) / n;
      dydt[2] = (mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S)/n - E / tau;
      dydt[3] = (pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S)/n - G / tau;
      dydt[4] = ((1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S)/n - P / tau;
      dydt[5] = alpha * E / tau - I / Ts;
      dydt[6] = (1 - alpha) * E / tau - A / Ta;
      dydt[7] = G / tau - B / Tm;
      dydt[8] = P / tau - C / Ti;
      
      return dydt;
    }
  
}

the first 4 ODEs have S so I divide them accordingly by n. Then in data block I added the following:

data {
  int<lower = 0, upper = 66500000> n //population
...
}

where ... means the rest are the same. Is this correct way of adding the parameter n?

Then in parameter block I commented out the part I have //real<lower = 0, upper = 1> S0; // Initial fraction of susceptible and in transformed parameter block I modified the initial conditions as:

transformed parameters{
...
  y0[1] = (1 - 0.02 - 0.001)*66500000;
  y0[2] = 0.02*66500000;
  y0[3] = 0;
  y0[4] = 0;
  y0[5] = 0.001*66500000;
  y0[6] = 0;
  y0[7] = 0;
  y0[8] = 0;
 ...
  
}

now in

stan_d = list(n_obs = length(datalist$cases),
              n_params = 13,
              n_difeq = 8,
              y = datalist$cases,
              t0 = 0,
              ts = sample_time)

y are integers. My full R + stan code is:

library(tidyverse)
library(deSolve)
library(dplyr)
library(ggplot2)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
datalist <- read.csv(file = 'xyzz.csv')



ggplot(data = datalist) + 
  geom_point(mapping = aes(x = date, y = cases)) + 
  labs(y = "Number of daily cases")



sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
n <- 66500000




# The Stan model statement:
cat(
  '
functions {
  
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[8];

      // Variables of the ODEs

      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      



      
      // The parameters of the ODEs
      real phi = params[1];
      real pe = params[2];
      real pie = params[3];
      real pee = params[4]; 
  
      real beta = params[5];
      real mu = params[6]; 
      real betam = params[7]; 
      real tau = params[8]; 
      real alpha = params[9]; 
      real Ts = params[10]; 
      real Ta = params[11]; 
      real Tm = params[12]; 
      real Ti = params[13]; 
      
      
      dydt[1] = - (mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S) / n;
      dydt[2] = (mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S)/n - E / tau;
      dydt[3] = (pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S)/n - G / tau;
      dydt[4] = ((1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S)/n - P / tau;
      dydt[5] = alpha * E / tau - I / Ts;
      dydt[6] = (1 - alpha) * E / tau - A / Ta;
      dydt[7] = G / tau - B / Tm;
      dydt[8] = P / tau - C / Ti;
      
      return dydt;
    }
  
}

data {
  int<lower = 0, upper = 66500000> n //population
  int<lower = 1> n_obs; // Number of days sampled
  int<lower = 1> n_params; // Number of model parameters
  int<lower = 1> n_difeq; // Number of differential equations in the system
  //int<lower = 1> n_sample; 
  int y[n_obs]; // The binomially distributed data
  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled
  
}

transformed data {
  real x_r[0];
  int x_i[0];
}

parameters {
 real<lower = 0, upper = 1> phi;
 real<lower = 0, upper = 1> pe;
 real<lower = 0, upper = 1> pie;
 real<lower = 0, upper = 1> pee;
 real<lower = 0> beta;
 real<lower = 0> mu;
 real<lower = 0>betam;
 real<lower = 0> tau;
 real<lower = 0> alpha;
 real<lower = 0> Ts;
 real<lower = 0> Ta;
 real<lower = 0> Tm;
 real<lower = 0> Ti;
 //real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}

transformed parameters{
  real y_hat[n_obs, n_difeq]; // Output from the ODE solver
  real y0[n_difeq]; // Initial conditions
  real<lower=0, upper=1> params1[4];
  real<lower = 0> params2[9]; 

  y0[1] = (1 - 0.02 - 0.001)*66500000;
  y0[2] = 0.02*66500000;
  y0[3] = 0;
  y0[4] = 0;
  y0[5] = 0.001*66500000;
  y0[6] = 0;
  y0[7] = 0;
  y0[8] = 0;
  
  y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
  //for(i in 1:n_obs){
    //for(j in 1:n_difeq){
       //y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
    //}
  //}
  
}

model {
  phi ~ normal(0, 1);  //phi
  pe ~ normal(0, 1); //pe
  pie ~ normal(0, 1); //pie
  pee ~ normal(0, 1); //pee
  
  beta ~ normal(0.5, 0.9); //beta
  mu ~ normal(0.2, 0.55); //mu
  betam ~ normal(0, 1);  //beta_m
  tau ~ normal(1, 20); //tau
  alpha ~ normal(0.1, 0.99); //alpha
  Ts ~ normal(3, 20); //Ts
  Ta ~ normal(1, 20); //Ta
  Tm ~ normal(1, 20); //Tm
  Ti ~ normal(0.5, 8); //Ti
  //S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
  
  y ~ poisson(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver
}

generated quantities {
  real R_0; 
  R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
  
}

', 
file = "SI_fit.stan", sep="", fill=T)

# FITTING

# For stan model we need the following variables:

stan_d = list(n_obs = length(datalist$cases),
              n_params = 13,
              n_difeq = 8,
              y = datalist$cases,
              t0 = 0,
              ts = sample_time)

# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")

# Test / debug the model:
test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

# Fit and sample from the posterior

#mod = stan(fit = test,
#           data = stan_d,
#           pars = params_monitor,
#           chains = 3,
#           warmup = 500,
#           iter = 1000,
#           control = list(adapt_delta = 0.99))
mod = stan(fit = test,
           data = stan_d,
           pars = params_monitor,
           chains = 3,
           warmup = 500,
           iter = 1000)

# You should do some MCMC diagnostics, including:
#traceplot(mod, pars="lp__")
#traceplot(mod, pars=c("params", "y0"))
#summary(mod)$summary[,"Rhat"]

# Extract the posterior samples to a structured list:
posts <- extract(mod)

which is not running, I wonder if the problem is because I have defined n on R side, or wether I made a mistake defining it on stan? namely my modification in data block. If I can do this everything would be integer and I should be able to run the naive likelihood I imagine.

I think your best bet is to look at the case study here and think through how they’ve defined everything: https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html . Their data is in total number of people infected and they’re modeling this with an SIR model.

Think about what are the units for I(t)? And what are the units for y[t]? Are they integers representing numbers of people? Or are they real numbers representing millions of people? Or are they real numbers representing some fraction of 66.5 million people?

1 Like

Thanks a lot for your suggestions @bbbales2. They are very helpful. Following your advice I did the followings:

First of all I reconstructed the model given by the link you have shared and I got exactly as they have the code for that is (I am adding this not to lose generality as their stan file is separated whereas mine is included in the same file and it works just fine):

library(outbreaks)
library(tidyverse)
head(influenza_england_1978_school)
ggplot(data = influenza_england_1978_school) + 
  geom_point(mapping = aes(x = date, y = in_bed)) + 
  labs(y = "Number of students in bed")



library(rstan)
library(gridExtra)
rstan_options (auto_write = TRUE)
options (mc.cores = parallel::detectCores ())
set.seed(3) # for reproductibility

cat(
  '
  functions {
  real[] sir(real t, real[] y, real[] theta, 
             real[] x_r, int[] x_i) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real N = x_i[1];
      
      real beta = theta[1];
      real gamma = theta[2];
      
      real dS_dt = -beta * I * S / N;
      real dI_dt =  beta * I * S / N - gamma * I;
      real dR_dt =  gamma * I;
      
      return {dS_dt, dI_dt, dR_dt};
  }
}
data {
  int<lower=1> n_days;
  real y0[3];
  real t0;
  real ts[n_days];
  int N;
  int cases[n_days];
}
transformed data {
  real x_r[0];
  int x_i[1] = { N };
}
parameters {
  real<lower=0> gamma;
  real<lower=0> beta;
  real<lower=0> phi_inv;
}
transformed parameters{
  real y[n_days, 3];
  real phi = 1. / phi_inv;
  {
    real theta[2];
    theta[1] = beta;
    theta[2] = gamma;

    y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
  }
}
model {
  //priors
  beta ~ normal(2, 1);
  gamma ~ normal(0.4, 0.5);
  phi_inv ~ exponential(5);
  
  //sampling distribution
  //col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
  cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);
}
generated quantities {
  real R0 = beta / gamma;
  real recovery_time = 1 / gamma;
  real pred_cases[n_days];
  pred_cases = neg_binomial_2_rng(col(to_matrix(y), 2), phi);
}
', 
file = "sir_negbin.stan", sep="", fill=T)

# time series of cases
cases <- influenza_england_1978_school$in_bed  # Number of students in bed
# total count
N <- 763;
# times
n_days <- length(cases) 
t <- seq(0, n_days, by = 1)
t0 = 0 
t <- t[-1]
#initial conditions
i0 <- 1
s0 <- N - i0
r0 <- 0
y0 = c(S = s0, I = i0, R = r0)
# data for Stan
data_sir <- list(n_days = n_days, y0 = y0, t0 = t0, ts = t, N = N, cases = cases)
# number of MCMC steps
niter <- 2000

model <- stan_model("sir_negbin.stan")

fit_sir_negbin <- sampling(model,
                data = data_sir,
                iter = niter,
                chains = 4)

pars=c('beta', 'gamma', "R0", "recovery_time")

print(fit_sir_negbin, pars = pars)

stan_dens(fit_sir_negbin, pars = pars, separate_chains = TRUE)

In the next step I did a naive modelling, namely I imported my data instead of the influenza’s to just check wether the model would run, it did indeed while the fit was not good (at least I know my data is workable). The code for that is as follow:

library(ggplot2)
datalist <- read.csv(file = 'xyzz.csv')
head(datalist)
ggplot(data = datalist) + 
  geom_point(mapping = aes(x = day, y = cases0)) + 
  labs(y = "Number of daily infected people")



library(rstan)
library(gridExtra)
rstan_options (auto_write = TRUE)
options (mc.cores = parallel::detectCores ())
set.seed(3) # for reproductibility



cat(
  '
  functions {
  real[] sir(real t, real[] y, real[] theta, 
             real[] x_r, int[] x_i) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real N = x_i[1];
      
      real beta = theta[1];
      real gamma = theta[2];
      
      real dS_dt = -beta * I * S / N;
      real dI_dt =  beta * I * S / N - gamma * I;
      real dR_dt =  gamma * I;
      
      return {dS_dt, dI_dt, dR_dt};
  }
}
data {
  int<lower=1> n_days;
  real y0[3];
  real t0;
  real ts[n_days];
  int N;
  int cases[n_days];
}
transformed data {
  real x_r[0];
  int x_i[1] = { N };
}
parameters {
  real<lower=0> gamma;
  real<lower=0> beta;
  real<lower=0> phi_inv;
}
transformed parameters{
  real y[n_days, 3];
  real phi = 1. / phi_inv;
  {
    real theta[2];
    theta[1] = beta;
    theta[2] = gamma;

    y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
  }
}
model {
  //priors
  beta ~ normal(2, 1);
  gamma ~ normal(0.4, 0.5);
  phi_inv ~ exponential(5);
  
  //sampling distribution
  //col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
  cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);
}
generated quantities {
  real R0 = beta / gamma;
  real recovery_time = 1 / gamma;
  real pred_cases[n_days];
  pred_cases = neg_binomial_2_rng(col(to_matrix(y), 2), phi);
}
', 
file = "sir_negbin.stan", sep="", fill=T)



# time series of cases
cases <- datalist$cases0  # Number of students in bed
# total count
N <- 66500000;
# times
n_days <- length(cases) 
t <- seq(0, n_days, by = 1)
t0 = 0 
t <- t[-1]
#initial conditions
i0 <- 1
s0 <- N - i0
r0 <- 0
y0 = c(S = s0, I = i0, R = r0)
# data for Stan
data_sir <- list(n_days = n_days, y0 = y0, t0 = t0, ts = t, N = N, cases = cases)
# number of MCMC steps
niter <- 2000



model <- stan_model("sir_negbin.stan")

fit_sir_negbin <- sampling(model,
                data = data_sir,
                iter = niter,
                chains = 4)

pars=c('beta', 'gamma', "R0", "recovery_time")

print(fit_sir_negbin, pars = pars)

stan_dens(fit_sir_negbin, pars = pars, separate_chains = TRUE)


smr_pred <- cbind(as.data.frame(summary(
  fit_sir_negbin, pars = "pred_cases", probs = c(0.05, 0.5, 0.95))$summary), t, cases)
colnames(smr_pred) <- make.names(colnames(smr_pred)) # to remove % in the col names

ggplot(smr_pred, mapping = aes(x = t)) +
  geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "orange", alpha = 0.6) +
  geom_line(mapping = aes(x = t, y = X50.)) + 
  geom_point(mapping = aes(y = cases)) +
  labs(x = "Day", y = "Number of daily infected people")

The data is here: xyzz.csv (2.1 KB) and you can see from the last part of the code that the fit is not great but as I mentioned thankfully it works. In the final step I went line by line and in similar manner employed my model:

library(ggplot2)
datalist <- read.csv(file = 'xyzz.csv')
head(datalist)
ggplot(data = datalist) + 
  geom_point(mapping = aes(x = day, y = cases0)) + 
  labs(y = "Number of daily infected people")



library(rstan)
library(gridExtra)
rstan_options (auto_write = TRUE)
options (mc.cores = parallel::detectCores ())
set.seed(3) # for reproductibility



cat(
  '
  functions {
  real[] sir(real t, real[] y, real[] theta, 
             real[] x_r, int[] x_i) {

      
      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      real N = x_i[1];
      
      real phi = theta[1];
      real pe = theta[2];
      real pie = theta[3];
      real pee = theta[4];
      real beta = theta[5];
      real mu = theta[6]; 
      real betam = theta[7]; 
      real tau = theta[8]; 
      real alpha = theta[9]; 
      real Ts = theta[10]; 
      real Ta = theta[11]; 
      real Tm = theta[12]; 
      real Ti = theta[13];
      
      
      real dS_dt = - mu * beta * S / N * A - beta * S / N * I - beta * C * S / N - betam * B * S / N;
      real dE_dt = mu * beta * S / N * A + (1 - phi) * beta * S / N * I + (1 - phi) * beta * C * S / N + (1 - phi) * betam * B * S / N - E / tau;
      real dG_dt = pe * phi * beta * S / N * I + pie * phi * beta * C * S / N + pee * phi * betam * B * S / N - G / tau;
      real dP_dt = (1 - pe) * phi * beta * S / N * I + (1 - pie) * phi * beta * C * S / N + (1 - pee) * phi * betam * B * S / N - P / tau;
      real dI_dt = alpha * E / tau - I / Ts;
      real dA_dt = (1 - alpha) * E / tau - A / Ta;
      real dB_dt = G / tau - B / Tm;
      real dC_dt = P / tau - C / Ti;
      
      return {dS_dt, dE_dt, dG_dt, dP_dt, dI_dt, dA_dt, dB_dt, dC_dt};
  }
}
data {
  int<lower=1> n_days;
  real y0[8];
  real t0;
  real ts[n_days];
  int N;
  int cases[n_days];
}
transformed data {
  real x_r[0];
  int x_i[1] = { N };
}
parameters {
  real<lower = 0, upper = 1> phi;
  real<lower = 0, upper = 1> pe;
  real<lower = 0, upper = 1> pie;
  real<lower = 0, upper = 1> pee;
  real<lower = 0> beta;
  real<lower = 0> mu;
  real<lower = 0> betam;
  real<lower = 0> tau;
  real<lower = 0> alpha;
  real<lower = 0> Ts;
  real<lower = 0> Ta;
  real<lower = 0> Tm;
  real<lower = 0> Ti;
  real<lower=0> phi_inv;
}
transformed parameters{
  real y[n_days, 8];
  real phi = 1. / phi_inv;
  {
    real theta[13];
    theta[1] = phi;
    theta[2] = pe;
    theta[3] = pie;
    theta[4] = pee;
    theta[5] = beta;
    theta[6] = mu; 
    theta[7] = betam; 
    theta[8] = tau; 
    theta[9] = alpha; 
    theta[10] = Ts; 
    theta[11] = Ta; 
    theta[12] = Tm; 
    theta[13] = Ti;

    y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
  }
}
model {
  //priors
  phi ~ normal(0, 1); 
  pe ~ normal(0, 1); 
  pie ~ normal(0, 1); 
  pee ~ normal(0, 1); 
  beta ~ normal(0.5, 0.9); 
  mu ~ normal(0.2, 0.55); 
  betam ~ normal(0, 1);  
  tau ~ normal(1, 20); 
  alpha ~ normal(0.1, 0.99); 
  Ts ~ normal(3, 20); 
  Ta ~ normal(1, 20); 
  Tm ~ normal(1, 20); 
  Ti ~ normal(0.5, 8);
  phi_inv ~ exponential(5);
  
  //sampling distribution
  //col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
  cases ~ neg_binomial_5(col(to_matrix(y), 5), phi);
}
generated quantities {
  real R0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
  real pred_cases[n_days];
  pred_cases = neg_binomial_5_rng(col(to_matrix(y), 5), phi);
}

', 
file = "sir_negbin.stan", sep="", fill=T)



# time series of cases
cases <- datalist$cases0  # Number of students in bed
# total count
N <- 66500000;
# times
n_days <- length(cases) 
t <- seq(0, n_days, by = 1)
t0 = 0 
t <- t[-1]
#initial conditions
i0 <- N / 10000
s0 <- N
e0 <- N / 1000
g0 <- 0
p0 <- 0
a0 <- N / 100000
b0 <- 0
c0 <- 0

y0 = c(S = s0, E = e0, G = g0, P = p0, I = i0, A = a0, B = b0, C = c0)
# data for Stan
data_sir <- list(n_days = n_days, y0 = y0, t0 = t0, ts = t, N = N, cases = cases)
# number of MCMC steps
niter <- 2000



model <- stan_model("sir_negbin.stan")

fit_sir_negbin <- sampling(model,
                data = data_sir,
                iter = niter,
                chains = 4)

pars=c('phi', 'pe', "pie", "pee", "beta", "mu", "betam", "tau", "alpha", "Ts", "Ta", "Tm", "Ta", "R0")

print(fit_sir_negbin, pars = pars)

stan_dens(fit_sir_negbin, pars = pars, separate_chains = TRUE)



smr_pred <- cbind(as.data.frame(summary(
  fit_sir_negbin, pars = "pred_cases", probs = c(0.05, 0.5, 0.95))$summary), t, cases)
colnames(smr_pred) <- make.names(colnames(smr_pred)) # to remove % in the col names

ggplot(smr_pred, mapping = aes(x = t)) +
  geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "orange", alpha = 0.6) +
  geom_line(mapping = aes(x = t, y = X50.)) + 
  geom_point(mapping = aes(y = cases)) +
  labs(x = "Day", y = "Number of daily infected people")

Now let me go through it, I have 8 ODEs with the dI_dt being the infected ones and the rest are other compartments. dI_dt is the 5th ODE so accordingly in model{} block I have:

  //sampling distribution
  //col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
  cases ~ neg_binomial_5(col(to_matrix(y), 5), phi);

where you see 5 as index for the infected compartment. This was

 cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);

for the original model as I(t) was the second ODE. In this model I am following the logic of the link in terms of defining the population i.e. N = S+E+G+P+I+A+B+C I have also defined the initial conditions accordingly. Yet when I run this it stops at the level of compiling the model namely

model <- stan_model("sir_negbin.stan")

I know that the likelihood might not be correct but I wonder why does the code not get compiled. I appreciate your help.

Thanks for the script. This is the error that I’m getting when I go to compile the last model:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Duplicate declaration of variable, name=phi; attempt to redeclare as real in transformed parameter; previously declared as real in parameter
 error in 'model668664f0eb3f_sir_negbin' at line 74, column 25
  -------------------------------------------------
    72: transformed parameters{
    73:   real y[n_days, 8];
    74:   real phi = 1. / phi_inv;
                                ^
    75:   {
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'sir_negbin' due to the above error.

Out of curiosity, are you getting this error? Or is rstan giving you something else? If it’s giving you something else, what operating system are you using?

1 Like

I was getting the

Error in close.connection(zz)

But from your run I think the problem was at line 74 real phi as I have had phi as parameter of ODE, so I changed the name of it to phi. The full code is:

library(ggplot2)
datalist <- read.csv(file = 'xyzz.csv')
head(datalist)
ggplot(data = datalist) + 
  geom_point(mapping = aes(x = day, y = cases0)) + 
  labs(y = "Number of daily infected people")



library(rstan)
library(gridExtra)
rstan_options (auto_write = TRUE)
options (mc.cores = parallel::detectCores ())
set.seed(3) # for reproductibility



cat(
  '
  functions {
  real[] sir(real t, real[] y, real[] theta, 
             real[] x_r, int[] x_i) {

      
      real S = y[1];
      real E = y[2];
      real G = y[3];
      real P = y[4];
      real I = y[5];
      real A = y[6];
      real B = y[7];
      real C = y[8];
      real N = x_i[1];
      
      real phi = theta[1];
      real pe = theta[2];
      real pie = theta[3];
      real pee = theta[4];
      real beta = theta[5];
      real mu = theta[6]; 
      real betam = theta[7]; 
      real tau = theta[8]; 
      real alpha = theta[9]; 
      real Ts = theta[10]; 
      real Ta = theta[11]; 
      real Tm = theta[12]; 
      real Ti = theta[13];
      
      
      real dS_dt = - mu * beta * S / N * A - beta * S / N * I - beta * C * S / N - betam * B * S / N;
      real dE_dt = mu * beta * S / N * A + (1 - phi) * beta * S / N * I + (1 - phi) * beta * C * S / N + (1 - phi) * betam * B * S / N - E / tau;
      real dG_dt = pe * phi * beta * S / N * I + pie * phi * beta * C * S / N + pee * phi * betam * B * S / N - G / tau;
      real dP_dt = (1 - pe) * phi * beta * S / N * I + (1 - pie) * phi * beta * C * S / N + (1 - pee) * phi * betam * B * S / N - P / tau;
      real dI_dt = alpha * E / tau - I / Ts;
      real dA_dt = (1 - alpha) * E / tau - A / Ta;
      real dB_dt = G / tau - B / Tm;
      real dC_dt = P / tau - C / Ti;
      
      return {dS_dt, dE_dt, dG_dt, dP_dt, dI_dt, dA_dt, dB_dt, dC_dt};
  }
}
data {
  int<lower=1> n_days;
  real y0[8];
  real t0;
  real ts[n_days];
  int N;
  int cases[n_days];
}
transformed data {
  real x_r[0];
  int x_i[1] = { N };
}
parameters {
  real<lower = 0, upper = 1> phi;
  real<lower = 0, upper = 1> pe;
  real<lower = 0, upper = 1> pie;
  real<lower = 0, upper = 1> pee;
  real<lower = 0> beta;
  real<lower = 0> mu;
  real<lower = 0> betam;
  real<lower = 0> tau;
  real<lower = 0> alpha;
  real<lower = 0> Ts;
  real<lower = 0> Ta;
  real<lower = 0> Tm;
  real<lower = 0> Ti;
  real<lower=0> phi_inv;
}
transformed parameters{
  real y[n_days, 8];
  real phii = 1. / phi_inv;
  {
    real theta[13];
    theta[1] = phi;
    theta[2] = pe;
    theta[3] = pie;
    theta[4] = pee;
    theta[5] = beta;
    theta[6] = mu; 
    theta[7] = betam; 
    theta[8] = tau; 
    theta[9] = alpha; 
    theta[10] = Ts; 
    theta[11] = Ta; 
    theta[12] = Tm; 
    theta[13] = Ti;

    y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
  }
}
model {
  //priors
  phi ~ normal(0, 1); 
  pe ~ normal(0, 1); 
  pie ~ normal(0, 1); 
  pee ~ normal(0, 1); 
  beta ~ normal(0.5, 0.9); 
  mu ~ normal(0.2, 0.55); 
  betam ~ normal(0, 1);  
  tau ~ normal(1, 20); 
  alpha ~ normal(0.1, 0.99); 
  Ts ~ normal(3, 20); 
  Ta ~ normal(1, 20); 
  Tm ~ normal(1, 20); 
  Ti ~ normal(0.5, 8);
  phi_inv ~ exponential(5);
  
  //sampling distribution
  //col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
  cases ~ neg_binomial_5(col(to_matrix(y), 5), phii);
}
generated quantities {
  real R0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
  real pred_cases[n_days];
  pred_cases = neg_binomial_5_rng(col(to_matrix(y), 5), phii);
}

', 
file = "sir_negbin.stan", sep="", fill=T)



# time series of cases
cases <- datalist$cases0  # Number of students in bed
# total count
N <- 66500000;
# times
n_days <- length(cases) 
t <- seq(0, n_days, by = 1)
t0 = 0 
t <- t[-1]
#initial conditions
i0 <- N / 10000
s0 <- N
e0 <- N / 1000
g0 <- 0
p0 <- 0
a0 <- N / 100000
b0 <- 0
c0 <- 0

y0 = c(S = s0, E = e0, G = g0, P = p0, I = i0, A = a0, B = b0, C = c0)
# data for Stan
data_sir <- list(n_days = n_days, y0 = y0, t0 = t0, ts = t, N = N, cases = cases)
# number of MCMC steps
niter <- 2000



model <- stan_model("sir_negbin.stan")

fit_sir_negbin <- sampling(model,
                data = data_sir,
                iter = niter,
                chains = 4)

pars=c('phi', 'pe', "pie", "pee", "beta", "mu", "betam", "tau", "alpha", "Ts", "Ta", "Tm", "Ta", "R0")

print(fit_sir_negbin, pars = pars)

stan_dens(fit_sir_negbin, pars = pars, separate_chains = TRUE)



smr_pred <- cbind(as.data.frame(summary(
  fit_sir_negbin, pars = "pred_cases", probs = c(0.05, 0.5, 0.95))$summary), t, cases)
colnames(smr_pred) <- make.names(colnames(smr_pred)) # to remove % in the col names

ggplot(smr_pred, mapping = aes(x = t)) +
  geom_ribbon(aes(ymin = X5., ymax = X95.), fill = "orange", alpha = 0.6) +
  geom_line(mapping = aes(x = t, y = X50.)) + 
  geom_point(mapping = aes(y = cases)) +
  labs(x = "Day", y = "Number of daily infected people")

where now in transformed parameter block we have real phii = 1. / phi_inv; subsequently, in model block we have:

 cases ~ neg_binomial_5(col(to_matrix(y), 5), phii);

and in generated quantities:

pred_cases = neg_binomial_5_rng(col(to_matrix(y), 5), phii);

yet it is not compiling unfortunately. I really appreciate your time and help on this @bbbales2.