SEIR model fitting

Hi,
I am new to stan, I am trying to replicate this, yet for an SEIR model with two infected compartments, one for asymptomatic and one for symptomatic. I have including my file here, the issue is when I run it I get the following error:
Error in stanc(file = file, model_code = model_code, model_name = model_name, : c++ exception (unknown reason)

here is my code:
seir_test.R (8.4 KB)

and here is my stan part:

functions {
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[5];
      
      dydt[1] = - params[1] * y[1] * y[3] - params[2] * params[1] * y[1] * y[4];
      dydt[2] = params[1] * y[1] * y[3] + params[2] * params[1] * y[1] * y[4] - y[2] / params[3];
      dydt[3] = params[4] * y[2] / params[3] - y[3] / params[5];
      dydt[4] = (1 - params[4]) * y[2] / params[3] - y[4] / params[6];
      dydt[5] = y[3] / params[5] + y[4] / params[6];
      
      return dydt;
    }
  
}

data {
  int<lower = 1> n_obs; // Number of days sampled
  int<lower = 1> n_params; // Number of model parameters
  int<lower = 1> n_difeq; // Number of differential equations in the system
  int<lower = 1> n_sample; // Number of hosts sampled at each time point.
  int<lower = 1> n_fake; // This is to generate "predicted"/"unsampled" data
  
  int y[n_obs]; // The binomially distributed data
  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled
  
  real fake_ts[n_fake]; // Time points for "predicted"/"unsampled" data
}

transformed data {
  real x_r[0];
  int x_i[0];
}

parameters {
  real<lower = 0> params[n_params]; // Model parameters
  real<lower = 0, upper = 1> S0; // Initial fraction of hosts susceptible
}

transformed parameters{
  real y_hat[n_obs, n_difeq]; // Output from the ODE solver
  real y0[n_difeq]; // Initial conditions for both S and I

  y0[2] = 1 / 1000;
  y0[3] = 1 / 10000;
  y0[4] = 1 / 100000;   
  y0[2] = 1 - Is0;
  y0[5] = 0;
  
  y_hat = integrate_ode_rk45(SI, y0, t0, ts, params, x_r, x_i);
  
}

model {
  params[1] ~ uniform(0.5, 0.9); //constrained to be positive
  params[2] ~ uniform(0.2, 0.55); //constrained to be positive
  params[3] ~ uniform(1, 20); //constrained to be positive
  params[4] ~ uniform(0.1, 0.99); //constrained to be positive
  params[5] ~ uniform(1, 4); //constrained to be positive
  params[6] ~ uniform(1, 2); //constrained to be positive
  S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
  
  y ~ binomial(n_sample, y_hat[, 2]); //y_hat[,2] are the fractions infected from the ODE solver
  
}

generated quantities {
  // Generate predicted data over the whole time series:
  real fake_I[n_fake, n_difeq];
  
  fake_I = integrate_ode_rk45(SI, y0, t0, fake_ts, params, x_r, x_i);
  
}

If you are going to use R 3.6.x, then you need to install rstan from source via

install.packages("rstan", type = "source")

However, upgrading to R 4.0 is better because the binaries work.

The problem is it runs fine when I run the original code from the link, namely (https://jrmihalj.github.io/estimating-transmission-by-fitting-mechanistic-models-in-Stan/) and this is with my current version of rstan and rstudio. I presume there is something wrong with the stan part of my code.

True but if you follow Ben’s advice you’ll get a more informative error message.

One possible problem is the uniform priors. If you use uniform priors on a parameter, you also need to constraint them to that range by using the <lower = , upper => syntax. The easiest adaptation would probably be to work with normal priors that approximate the range of the uniform priors.

CRAN’s R packages are not versioned so following Ben’s advice is the way to go here. The rstan package can only track the latest R version and there have been breaking changes. Stan should never just crash with an “unknown reason” c++ exception which makes helping difficult unless you update.

As a bonus the error messages tend to be very helpful and specific so if you are new to Stan they are especially helpful.

Thank you all for the replies. So I have done @bgoodri’s suggestion, namely, I installed rstan from source now. I further simplified my problem by just having 4 ODEs than 5 to see if it works. When I run this now I get the following error:

Error in close.connection(zz) : cannot close 'message' sink connection

5. close.connection(zz)

4. close(zz)

3. stanc(file = file, model_code = model_code, model_name = model_name, verbose = verbose, obfuscate_model_name = obfuscate_model_name, allow_undefined = allow_undefined, isystem = isystem)

2. stan_model(file, model_name = model_name, model_code = model_code, stanc_ret = NULL, boost_lib = boost_lib, eigen_lib = eigen_lib, save_dso = save_dso, verbose = verbose)

1. stan("SI_fit.stan", data = stan_d, pars = params_monitor, chains = 1, iter = 10) 

My stan code is:

functions {
  
  // This largely follows the deSolve package, but also includes the x_r and x_i variables.
  // These variables are used in the background.
  
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[4];
      
      dydt[1] = - params[1] * y[1] * y[3];
      dydt[2] = params[1] * y[1] * y[3] - params[2] * y[2];
      dydt[3] = params[2] * y[2] - params[3] * y[3];
      dydt[4] = params[3] * y[3];
      
      return dydt;
    }
  
}

data {
  int<lower = 1> n_obs; // Number of days sampled
  int<lower = 1> n_params; // Number of model parameters
  int<lower = 1> n_difeq; // Number of differential equations in the system
  int<lower = 1> n_sample; // Number of hosts sampled at each time point.
  int<lower = 1> n_fake; // This is to generate "predicted"/"unsampled" data
  
  int y[n_obs]; // The binomially distributed data
  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled
  
  real fake_ts[n_fake]; // Time points for "predicted"/"unsampled" data
}

transformed data {
  real x_r[0];
  int x_i[0];
}

parameters {
  real<lower = 0> params[n_params]; // Model parameters
  real<lower = 0, upper = 1> S0; // Initial fraction of hosts susceptible
}

transformed parameters{
  real y_hat[n_obs, n_difeq]; // Output from the ODE solver
  real y0[n_difeq]; // Initial conditions for both S and I

  y0[1] = S0;
  y0[2] = 1 - S0 - I0;
  y0[3] = 1 - S0 - E0;
  y0[4] = 0;
  
  y_hat = integrate_ode_rk45(SI, y0, t0, ts, params, x_r, x_i);
  
}

model {
  params ~ normal(0, 2); //constrained to be positive
  S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
  
  y ~ binomial(n_sample, y_hat[, 2]); //y_hat[,2] are the fractions infected from the ODE solver
  
}

generated quantities {
  // Generate predicted data over the whole time series:
  real fake_I[n_fake, n_difeq];
  
  fake_I = integrate_ode_rk45(SI, y0, t0, fake_ts, params, x_r, x_i);
  
}

which is just a modification to the link I attached earlier (let me reiterate that I can run the code from the link without any problem). I am also attaching my R code:

```{r}
# Load some required packages
##############
library(deSolve)
library(dplyr)
library(ggplot2)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
##############

# To simulate the data, we need to assign initial conditions.
# In practice, these will likely be unknown, but can be estimated from the data.

E0 = 0.02
I0 = 0.001    # initial fraction infected
S0 = 1 - I0 - E0 # initial fraction susceptible
R0 = 0

# Assign transmission and pathogen-induced death rates:
beta = 0.60
gamma = 0.10
mu = 0.15

# We will use the package deSolve to integrate, which requires certain data structures.
# Store parameters and initial values
# Parameters must be stored in a named list.
params <- list(beta = beta,
               gamma = gamma,
               mu = mu)

# Initial conditions are stored in a vector
inits <- c(S0, E0, I0, R0)

# Create a time series over which to integrate.
# Here we have an epidemic that is observed over t_max number of days (or weeks or etc).
t_min = 0
t_max = 50
times = t_min:t_max

# We must create a function for the system of ODEs.
# See the 'ode' function documentation for further insights.
SIR <- function(t, y, params) {
  with(as.list(c(params, y)), {
    
    dS = - beta * y[1] * y[3]
    
    dE = beta * y[1] * y[3] - gamma * y[2]
    
    dI = gamma * y[2] - mu * y[3]
    
    dR = mu * y[3]
    
    res <- c(dS,dE,dI,dR)
    list(res)
  })
}

# Run the integration:
out <- ode(inits, times, SIR, params, method="ode45")

# Store the output in a data frame:
out <- data.frame(out)
colnames(out) <- c("time", "S", "E", "I", "R")

# quick plot of the epidemic
plot(NA,NA, xlim = c(t_min, t_max), ylim=c(0, 1), xlab = "Time", ylab="Fraction of Host Population")
lines(out$S ~ out$time, col="black")
lines(out$E ~ out$time, col="orange")
lines(out$I ~ out$time, col="red")
lines(out$R ~ out$time, col="blue")
legend(x = 30, y = 0.8, legend = c("Susceptible", "Exposed", "Infected", "Recovered"), 
       col = c("black", "orange", "red", "blue"), lty = c(1, 1), bty="n")
```

```{r}
sample_days = 20 # number of days sampled throughout the epidemic
sample_n = 25 # number of host individuals sampled per day

# Choose which days the samples were taken. 
# Ideally this would be daily, but we all know that is difficult.
sample_time = sort(sample(1:t_max, sample_days, replace=F))

# Extract the "true" fraction of the population that is infected on each of the sampled days:
sample_propinf = out[out$time %in% sample_time, 3]

# Generate binomially distributed data.
# So, on each day we sample a given number of people (sample_n), and measure how many are infected.
# We expect binomially distributed error in this estimate, hence the random number generation.
sample_y = rbinom(sample_days, sample_n, sample_propinf)
```

```{r}
# The Stan model statement:
cat(
  '
functions {
  
  // This largely follows the deSolve package, but also includes the x_r and x_i variables.
  // These variables are used in the background.
  
  real[] SI(real t,
            real[] y,
            real[] params,
            real[] x_r,
            int[] x_i) {
      
      real dydt[4];
      
      dydt[1] = - params[1] * y[1] * y[3];
      dydt[2] = params[1] * y[1] * y[3] - params[2] * y[2];
      dydt[3] = params[2] * y[2] - params[3] * y[3];
      dydt[4] = params[3] * y[3];
      
      return dydt;
    }
  
}

data {
  int<lower = 1> n_obs; // Number of days sampled
  int<lower = 1> n_params; // Number of model parameters
  int<lower = 1> n_difeq; // Number of differential equations in the system
  int<lower = 1> n_sample; // Number of hosts sampled at each time point.
  int<lower = 1> n_fake; // This is to generate "predicted"/"unsampled" data
  
  int y[n_obs]; // The binomially distributed data
  real t0; // Initial time point (zero)
  real ts[n_obs]; // Time points that were sampled
  
  real fake_ts[n_fake]; // Time points for "predicted"/"unsampled" data
}

transformed data {
  real x_r[0];
  int x_i[0];
}

parameters {
  real<lower = 0> params[n_params]; // Model parameters
  real<lower = 0, upper = 1> S0; // Initial fraction of hosts susceptible
}

transformed parameters{
  real y_hat[n_obs, n_difeq]; // Output from the ODE solver
  real y0[n_difeq]; // Initial conditions for both S and I

  y0[1] = S0;
  y0[2] = 1 - S0 - I0;
  y0[3] = 1 - S0 - E0;
  y0[4] = 0;
  
  y_hat = integrate_ode_rk45(SI, y0, t0, ts, params, x_r, x_i);
  
}

model {
  params ~ normal(0, 2); //constrained to be positive
  S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
  
  y ~ binomial(n_sample, y_hat[, 3]); //y_hat[,3] are the fractions infected from the ODE solver
  
}

generated quantities {
  // Generate predicted data over the whole time series:
  real fake_I[n_fake, n_difeq];
  
  fake_I = integrate_ode_rk45(SI, y0, t0, fake_ts, params, x_r, x_i);
  
}

', 
file = "SI_fit.stan", sep="", fill=T)

# FITTING

# For stan model we need the following variables:

stan_d = list(n_obs = sample_days,
              n_params = length(params),
              n_difeq = length(inits),
              n_sample = sample_n,
              n_fake = length(1:t_max),
              y = sample_y,
              t0 = 0,
              ts = sample_time,
              fake_ts = c(1:t_max))

# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "fake_I")

# Test / debug the model:
test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

# Fit and sample from the posterior
mod = stan(fit = test,
           data = stan_d,
           pars = params_monitor,
           chains = 3,
           warmup = 500,
           iter = 1500)

# You should do some MCMC diagnostics, including:
#traceplot(mod, pars="lp__")
#traceplot(mod, pars=c("params", "y0"))
#summary(mod)$summary[,"Rhat"]

# These all check out for my model, so I'll move on.

# Extract the posterior samples to a structured list:
posts <- extract(mod)
```

the problem error happens when

test = stan("SI_fit.stan",
            data = stan_d,
            pars = params_monitor,
            chains = 1, iter = 10)

is triggered. Thank you for the guidance.

So as an update I tried to use: debug(rstan::stanc) as it was suggested when tackling Error in close.connection(zz) when I do the debug I get:

debugging in: stanc(file = file, model_code = model_code, model_name = model_name, 
    verbose = verbose, obfuscate_model_name = obfuscate_model_name, 
    allow_undefined = allow_undefined, isystem = isystem)
debug: {
    model_name2 <- deparse(substitute(model_code))
    if (is.null(attr(model_code, "model_name2"))) 
        attr(model_code, "model_name2") <- model_name2
    model_code <- get_model_strcode(file, model_code)
    if (missing(model_name) || is.null(model_name)) 
        model_name <- attr(model_code, "model_name2")
    if (verbose) 
        cat("\nTRANSLATING MODEL '", model_name, "' FROM Stan CODE TO C++ CODE NOW.\n", 
            sep = "")
    SUCCESS_RC <- 0
    EXCEPTION_RC <- -1
    PARSE_FAIL_RC <- -2
    model_cppname <- legitimate_model_name(model_name, obfuscate_name = obfuscate_model_name)
    tf <- tempfile(fileext = ".parser")
    zz <- base::file(tf, open = "wt")
    on.exit(close(zz), add = TRUE)
    sink(zz, type = "message")
    r <- .Call(CPP_stanc280, model_code, model_cppname, allow_undefined, 
        isystem)
    sink(type = "message")
    close(zz)
    on.exit(NULL)
    r$model_name <- model_name
    r$model_code <- model_code
    if (is.null(r)) {
        stop(paste("failed to run stanc for model '", model_name, 
            "' and no error message provided", sep = ""))
    }
    else if (r$status == PARSE_FAIL_RC) {
        stop(paste("failed to parse Stan model '", model_name, 
            "' and no error message provided"), sep = "")
    }
    else if (r$status == EXCEPTION_RC) {
        lapply(r$msg, function(x) message(x))
        error_msg <- paste("failed to parse Stan model '", model_name, 
            "' due to the above error.", sep = "")
        stop(error_msg)
    }
    if (r$status == SUCCESS_RC && verbose) 
        cat("successful in parsing the Stan model '", model_name, 
            "'.\n", sep = "")
    msg <- readLines(tf)
    msg <- grep("Unknown variable", msg, value = TRUE, invert = TRUE)
    msg <- grep("aliasing", msg, value = TRUE, invert = TRUE)
    if (length(msg) > 2L) {
        cat(msg, sep = "\n")
    }
    else {
        try(file.remove(tf), silent = TRUE)
    }
    r$status = !as.logical(r$status)
    return(r)
}

I wonder how can I interpret this and know where the error is happening? any help would be great.

This just tells me that

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "I0" does not exist.
 error in 'model73704ad25491_SEIR' at line 53, column 21
  -------------------------------------------------
    51: 
    52:   y0[1] = S0;
    53:   y0[2] = 1 - S0 - I0;
                            ^
    54:   y0[3] = 1 - S0 - E0;
  -------------------------------------------------

Thank you @bgoodri I changed it and now it works. Could you tell me how you found the SYNTAX ERROR from the debug? so I know it for the future references. Thanks a lot.

You probably just need to install rstan from source with

install.packages("rstan", type = "source")

Hi llia, could you tell me how do you fix the I0 issue? Did you add a definition of I0 in the parameters block and keep others unchanged? I am now running a similar project and also new in stan. I would appreciate if you could share your revised codes!

1 Like