 # Estimating parameters of a SEIR model

Recently I have opened a topic here seeking a solution to the error I was getting while running stan. So far, I have managed to polish my code using the advice I got from that topic. Here is my stan code:

``````cat(
'
functions {

real[] SI(real t,
real[] y,
real[] params,
real[] x_r,
int[] x_i) {

real dydt;

real phi = params;
real pe = params;
real pie = params;
real pee = params;

real beta = params;
real mu = params;
real betam = params;
real tau = params;
real alpha = params;
real Ts = params;
real Ta = params;
real Tm = params;
real Ti = params;

dydt = - mu * beta * y * y - beta * y * y - beta * y * y - betam * y * y;
dydt = mu * beta * y * y + (1 - phi) * beta * y * y + (1 - phi) * beta * y * y + (1 - phi) * betam * y * y - y / tau;
dydt = pe * phi * beta * y * y + pie * phi * beta * y * y + pee * phi * betam * y * y - y / tau;
dydt = (1 - pe) * phi * beta * y * y + (1 - pie) * phi * beta * y * y + (1 - pee) * phi * betam * y * y - y / tau;
dydt = alpha * y / tau - y / Ts;
dydt = (1 - alpha) * y / tau - y / Ta;
dydt = y / tau - y / Tm;
dydt = y / tau - y / Ti;

return dydt;
}

}

data {
int<lower = 1> n_obs; // Number of days sampled
int<lower = 1> n_params; // Number of model parameters
int<lower = 1> n_difeq; // Number of differential equations in the system
int<lower = 1> n_sample;
int y[n_obs]; // The binomially distributed data
real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled

}

transformed data {
real x_r;
int x_i;
}

parameters {
real<lower=0, upper=1> params1;
real<lower = 0> params2;
real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}

transformed parameters{
real y_hat[n_obs, n_difeq]; // Output from the ODE solver
real y0[n_difeq]; // Initial conditions for both S and I

y0 = 1 - 0.02 - 0.001;
y0 = 0.02;
y0 = 0;
y0 = 0;
y0 = 0.001;
y0 = 0;
y0 = 0;
y0 = 0;

y_hat = integrate_ode_rk45(SI, y0, t0, ts, append_array(params1, params2), x_r, x_i);
y_hat = 1e-6 + (1-2e-6)*y_hat;

}

model {
phi ~ normal(0, 1);  //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee

beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1);  //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti
S0 ~ normal(0.5, 0.5); //constrained to be 0-1.

y ~ binomial(n_sample, y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver

}

generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));

}

',
file = "SI_fit.stan", sep="", fill=T)
``````

As you can see I have 8 ODEs and 13 parameters to estimate, 4 of these parameters namely `phi`, `pe`, `pie` and `pee` are probabilities and so should be between 0 and 1. I was suggested to distinguish my parameters - those that are probabilities and those which are not, so I now have:

``````parameters {
real<lower=0, upper=1> params1;
real<lower = 0> params2;
...
}
``````

and then in transformed parameters my y_hat is:

``````transformed parameters{
...

y_hat = integrate_ode_rk45(SI, y0, t0, ts, append_array(params1, params2), x_r, x_i);
y_hat = 1e-6 + (1-2e-6)*y_hat;

}
``````

where I am appending params1 (probabilities) and params2 (the rest of the parameters). As you can see in the stan code now I have 2 names for my parameters: `params` then in parameters I separate them into `params1` and `params2` I wonder if this makes any issue on running the stan code as it simply stucks at running `stan()`. I am adding my whole code to this topic so one can run it at ease. Any help would be much appreciated by this stan newbie.

I am also dubious on my definition of model on R I defined:

``````stan_d = list(n_obs = N,
n_params = 13,
n_difeq = 8,
n_sample = 66650000,
y = cases,
t0 = 0,
ts = sample_time)
``````

where N is the length of my list in days, I wonder if I need the n_sample which is the total population?

Issue1.R (4.6 KB)

I didn’t quite get what the problem is. Can you provide more detail, perhaps showing the output of the `stan()` call?

Thanks for your time, so in the beginning of the stan function I only have `params` defined. I wonder if separating params into `params1` and `params2` is allowed the way I did it? I mean I have not seen such indexing in other languages. I have 13 parameters and separated them as `params1` (which are probabilities) and `params2`. So I wonder how stan is keeping track of them? As later on at `stan_d()` I have n_params = 13 without separation. This is my confusion.

As for the rest of the code when I run it I get:
`Error in close.connection(zz) : cannot close 'message' sink connection`
happening at `test = stan()` when I run debugging I get,

``````function (file, model_code = "", model_name = "anon_model",
verbose = FALSE, obfuscate_model_name = TRUE, allow_undefined = FALSE,
isystem = c(if (!missing(file)) dirname(file), getwd()))
{
model_name2 <- deparse(substitute(model_code))
if (is.null(attr(model_code, "model_name2")))
attr(model_code, "model_name2") <- model_name2
model_code <- get_model_strcode(file, model_code)
if (missing(model_name) || is.null(model_name))
model_name <- attr(model_code, "model_name2")
if (verbose)
cat("\nTRANSLATING MODEL '", model_name, "' FROM Stan CODE TO C++ CODE NOW.\n",
sep = "")
SUCCESS_RC <- 0
EXCEPTION_RC <- -1
PARSE_FAIL_RC <- -2
model_cppname <- legitimate_model_name(model_name, obfuscate_name = obfuscate_model_name)
tf <- tempfile(fileext = ".parser")
zz <- base::file(tf, open = "wt")
sink(zz, type = "message")
r <- .Call(CPP_stanc280, model_code, model_cppname, allow_undefined,
isystem)
sink(type = "message")
close(zz)
on.exit(NULL)
r\$model_name <- model_name
r\$model_code <- model_code
if (is.null(r)) {
stop(paste("failed to run stanc for model '", model_name,
"' and no error message provided", sep = ""))
}
else if (r\$status == PARSE_FAIL_RC) {
stop(paste("failed to parse Stan model '", model_name,
"' and no error message provided"), sep = "")
}
else if (r\$status == EXCEPTION_RC) {
lapply(r\$msg, function(x) message(x))
error_msg <- paste("failed to parse Stan model '", model_name,
"' due to the above error.", sep = "")
stop(error_msg)
}
if (r\$status == SUCCESS_RC && verbose)
cat("successful in parsing the Stan model '", model_name,
"'.\n", sep = "")
msg <- grep("Unknown variable", msg, value = TRUE, invert = TRUE)
msg <- grep("aliasing", msg, value = TRUE, invert = TRUE)
if (length(msg) > 2L) {
cat(msg, sep = "\n")
}
else {
try(file.remove(tf), silent = TRUE)
}
r\$status = !as.logical(r\$status)
return(r)
}
``````

and then:

``````function (file, model_name = "anon_model", model_code = "",
stanc_ret = NULL, boost_lib = NULL, eigen_lib = NULL, save_dso = TRUE,
verbose = FALSE, auto_write = rstan_options("auto_write"),
obfuscate_model_name = TRUE, allow_undefined = FALSE, includes = NULL,
isystem = c(if (!missing(file)) dirname(file), getwd()))
{
if (is.null(stanc_ret)) {
model_name2 <- deparse(substitute(model_code))
if (is.null(attr(model_code, "model_name2")))
attr(model_code, "model_name2") <- model_name2
if (missing(model_name))
model_name <- NULL
if (missing(file)) {
tf <- tempfile()
writeLines(model_code, con = tf)
file <- file.path(dirname(tf), paste0(tools::md5sum(tf),
".stan"))
if (!file.exists(file))
file.rename(from = tf, to = file)
else file.remove(tf)
}
else file <- normalizePath(file)
stanc_ret <- stanc(file = file, model_code = model_code,
model_name = model_name, verbose = verbose, obfuscate_model_name = obfuscate_model_name,
allow_undefined = allow_undefined, isystem = isystem)
model_re <- "(^[[:alnum:]]{2,}.*\$)|(^[A-E,G-S,U-Z,a-z].*\$)|(^[F,T].+)"
if (!is.null(model_name))
if (!grepl(model_re, model_name))
stop("model name must match ", model_re)
S4_objects <- apropos(model_re, mode = "S4", ignore.case = FALSE)
if (length(S4_objects) > 0) {
e <- environment()
stanfits <- sapply(mget(S4_objects, envir = e, inherits = TRUE),
FUN = is, class2 = "stanfit")
stanmodels <- sapply(mget(S4_objects, envir = e,
inherits = TRUE), FUN = is, class2 = "stanmodel")
if (any(stanfits))
for (i in names(which(stanfits))) {
obj <- get_stanmodel(get(i, envir = e, inherits = TRUE))
if (identical(obj@model_code, stanc_ret\$model_code))
return(obj)
}
if (any(stanmodels))
for (i in names(which(stanmodels))) {
obj <- get(i, envir = e, inherits = TRUE)
if (identical(obj@model_code, stanc_ret\$model_code))
return(obj)
}
}
mtime <- file.info(file)\$mtime
file.rds <- gsub("stan\$", "rds", file)
md5 <- tools::md5sum(file)
if (!file.exists(file.rds)) {
file.rds <- file.path(tempdir(), paste0(md5, ".rds"))
}
if (!file.exists(file.rds) || (mtime.rds <- file.info(file.rds)\$mtime) <
as.POSIXct(packageDescription("rstan")\$Date) ||
!is(obj <- readRDS(file.rds), "stanmodel") || !is_sm_valid(obj) ||
(!identical(stanc_ret\$model_code, obj@model_code) &&
is.null(message("hash mismatch so recompiling; make sure Stan code ends with a blank line"))) ||
avoid_crash(obj@dso@.CXXDSOMISC\$module) && is.null(message("recompiling to avoid crashing R session"))) {
}
else return(invisible(obj))
}
if (!is.list(stanc_ret)) {
stop("stanc_ret needs to be the returned object from stanc.")
}
m <- match(c("cppcode", "model_name", "status"), names(stanc_ret))
if (any(is.na(m))) {
stop("stanc_ret does not have element `cppcode', `model_name', and `status'")
}
else {
if (!stanc_ret\$status)
stop("stanc_ret is not a successfully returned list from stanc")
}
if (.Platform\$OS.type != "windows") {
CXX <- get_CXX()
if (!is.null(attr(CXX, "status")) || nchar(CXX) == 0) {
WIKI <- "https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started"
WIKI))
}
else if (grepl("69", CXX, fixed = TRUE))
warning("You may need to launch Xcode once to accept its license")
}
else CXX <- "g++"
model_cppname <- stanc_ret\$model_cppname
model_name <- stanc_ret\$model_name
model_code <- stanc_ret\$model_code
model_cppcode <- stanc_ret\$cppcode
inc <- paste("#define STAN__SERVICES__COMMAND_HPP", "#include <boost/integer/integer_log2.hpp>\n",
"#include <rstan/rstaninc.hpp>\n", if (is.null(includes))
model_cppcode
else sub("(class[[:space:]]+[A-Za-z_][A-Za-z0-9_]*[[:space:]]*)",
paste(includes, "\\1"), model_cppcode), get_Rcpp_module_def_code(model_cppname),
sep = "")
if (verbose && interactive())
cat("COMPILING THE C++ CODE FOR MODEL '", model_name,
"' NOW.\n", sep = "")
if (verbose)
cat(system_info(), "\n")
if (!is.null(boost_lib)) {
old.boost_lib <- rstan_options(boost_lib = boost_lib)
on.exit(rstan_options(boost_lib = old.boost_lib))
}
if (!file.exists(rstan_options("boost_lib")))
if (!is.null(eigen_lib)) {
old.eigen_lib <- rstan_options(eigen_lib = eigen_lib)
on.exit(rstan_options(eigen_lib = old.eigen_lib), add = TRUE)
}
if (!file.exists(rstan_options("eigen_lib")))
dso <- cxxfunctionplus(signature(), body = paste(" return Rcpp::wrap(\"",
model_name, "\");", sep = ""), includes = inc, plugin = "rstan",
save_dso = save_dso | auto_write, module_name = paste("stan_fit4",
model_cppname, "_mod", sep = ""), verbose = verbose)
if (FALSE && grepl("#include", model_code, fixed = TRUE)) {
model_code <- scan(text = model_code, what = character(),
sep = "\n", quiet = TRUE)
model_code <- gsub("#include /", "#include ", model_code,
fixed = TRUE)
model_code <- gsub("#include (.*\$)", "#include \"\\1\"",
model_code)
unprocessed <- tempfile(fileext = ".stan")
processed <- tempfile(fileext = ".stan")
on.exit(file.remove(c(unprocessed, processed)))
writeLines(model_code, con = unprocessed)
ARGS <- paste("-E -nostdinc -x c++ -P -C", paste("-I",
isystem, " ", collapse = ""), "-o", processed, unprocessed)
pkgbuild::with_build_tools(system2(CXX, args = ARGS),
required = rstan_options("required") && identical(Sys.getenv("WINDOWS"),
"TRUE") && !identical(Sys.getenv("R_PACKAGE_SOURCE"),
""))
if (file.exists(processed))
model_code <- paste(readLines(processed), collapse = "\n")
}
obj <- new("stanmodel", model_name = model_name, model_code = model_code,
dso = dso, mk_cppmodule = mk_cppmodule, model_cpp = list(model_cppname = model_cppname,
model_cppcode = model_cppcode))
if (missing(file) || (file.access(dirname(file), mode = 2) !=
0) || !isTRUE(auto_write)) {
tf <- tempfile()
writeLines(model_code, con = tf)
file <- file.path(tempdir(), paste0(tools::md5sum(tf),
".stan"))
if (!file.exists(file))
file.rename(from = tf, to = file)
else file.remove(tf)
saveRDS(obj, file = gsub("stan\$", "rds", file))
}
else if (isTRUE(auto_write)) {
file <- gsub("stan\$", "rds", file)
if (file.exists(file)) {
rds <- try(readRDS(file), silent = TRUE)
if (!is(rds, "stanmodel"))
warning(rds, " exists but is not a 'stanmodel' so not overwriting")
else saveRDS(obj, file = file)
}
else saveRDS(obj, file = file)
}
invisible(obj)
}
``````

this is using `debug(rstan::stanc)`

Yep, seems to be perfectly valid syntax.

Your program has a few issues, however. First, you need to define things like `phi`, `beta`, etc, in the parameters block. I’ve done that for you this time around. Secondly, you cannot add the numerical correction

``````  y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
y_hat = 1e-6 + (1-2e-6)*y_hat;
``````

I have hacked this monstrosity

``````  y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
for(i in 1:n_obs){
for(j in 1:n_difeq){
y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
}
}
``````

and it will suffice. Folks like @bbbales2 and @nhuurre will know more efficient/elegant ways of adding the numerical correction.

Full program:

``````functions {

real[] SI(real t,
real[] y,
real[] params,
real[] x_r,
int[] x_i) {

real dydt;

real phi = params;
real pe = params;
real pie = params;
real pee = params;

real beta = params;
real mu = params;
real betam = params;
real tau = params;
real alpha = params;
real Ts = params;
real Ta = params;
real Tm = params;
real Ti = params;

dydt = - mu * beta * y * y - beta * y * y - beta * y * y - betam * y * y;
dydt = mu * beta * y * y + (1 - phi) * beta * y * y + (1 - phi) * beta * y * y + (1 - phi) * betam * y * y - y / tau;
dydt = pe * phi * beta * y * y + pie * phi * beta * y * y + pee * phi * betam * y * y - y / tau;
dydt = (1 - pe) * phi * beta * y * y + (1 - pie) * phi * beta * y * y + (1 - pee) * phi * betam * y * y - y / tau;
dydt = alpha * y / tau - y / Ts;
dydt = (1 - alpha) * y / tau - y / Ta;
dydt = y / tau - y / Tm;
dydt = y / tau - y / Ti;

return dydt;
}

}

data {
int<lower = 1> n_obs; // Number of days sampled
int<lower = 1> n_params; // Number of model parameters
int<lower = 1> n_difeq; // Number of differential equations in the system
int<lower = 1> n_sample;
int y[n_obs]; // The binomially distributed data
real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled

}

transformed data {
real x_r;
int x_i;
}

parameters {
real<lower = 0, upper = 1> phi;
real<lower = 0, upper = 1> pe;
real<lower = 0, upper = 1> pie;
real<lower = 0, upper = 1> pee;
real<lower = 0> beta;
real<lower = 0> mu;
real<lower = 0>betam;
real<lower = 0> tau;
real<lower = 0> alpha;
real<lower = 0> Ts;
real<lower = 0> Ta;
real<lower = 0> Tm;
real<lower = 0> Ti;
real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}

transformed parameters{
real y_hat[n_obs, n_difeq]; // Output from the ODE solver
real y0[n_difeq]; // Initial conditions for both S and I
real<lower=0, upper=1> params1;
real<lower = 0> params2;

y0 = 1 - 0.02 - 0.001;
y0 = 0.02;
y0 = 0;
y0 = 0;
y0 = 0.001;
y0 = 0;
y0 = 0;
y0 = 0;

y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
for(i in 1:n_obs){
for(j in 1:n_difeq){
y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
}
}

}

model {
phi ~ normal(0, 1);  //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee

beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1);  //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti
S0 ~ normal(0.5, 0.5); //constrained to be 0-1.

y ~ binomial(n_sample, y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver

}

generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));

}
``````

Hope this helps.

PS: I found all of those problems using Rstudio’s Stan code checking. I have no idea what the errors you report are. What interface are you using?

4 Likes

This is probably relevant: https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html . Not the exact model, but something similar.

3 Likes

Thanks a lot for this. It surely does help. Now I understood how I should have organised the parameters. Now when I run the program however I encounter the following:

``````hash mismatch so recompiling; make sure Stan code ends with a blank line
no parameter params; sampling not done
``````

at `test = stan()`and then

``````starting worker pid=3016 on localhost:11025 at 19:22:06.591
starting worker pid=3030 on localhost:11025 at 19:22:06.781
starting worker pid=3044 on localhost:11025 at 19:22:06.967
no parameter params; sampling not done
no parameter params; sampling not done
no parameter params; sampling not done
some chains had errors; consider specifying chains = 1 to debughere are whatever error messages were returned
[]
Stan model 'SI_fit' does not contain samples.

[]
Stan model 'SI_fit' does not contain samples.

[]
Stan model 'SI_fit' does not contain samples.
``````

at `mod = stan()`. I wonder how I many address this?

As for the RStudio I’m using version 1.2.5033, and my R is version 3.6.3 and my rstan is version 2.19.3. I am running this on MacOs Catalina unfortunately which I know has issues. I don’t get the errors as you do for me it only shows `connection error` which I shared. I tried to upgrade my R to version 4 before yet after I have had problems installing rstan correctly. So I had to downgrade to what I am using now to at least be able to run the code.

Thanks a lot I must check this.

Start by specifying `nchains=1` as instructed in the error message. This should give you more informative errors.

So my latest update is as follow:

I am following the link shared by @bbbales2. My data contains daily number of confirmed cases of covid-19 in the UK. My stan code is as follow:

`````` functions {

real[] sir(real t,
real[] y,
real[] theta,
real[] x_r,
int[] x_i) {

// I renamed my ODEs variables here:
real S = y;
real E = y;
real G = y;
real P = y;
real I = y;
real A = y;
real B = y;
real C = y;
real N = x_i;

// Here are the list of my parameters
real phi = theta;
real pe = theta;
real pie = theta;
real pee = theta;
real beta = theta;
real mu = theta;
real betam = theta;
real tau = theta;
real alpha = theta;
real Ts = theta;
real Ta = theta;
real Tm = theta;
real Ti = theta;

// here are the ODEs. I notice that in the link they define them as real dS_dt rather than dydt
real dS_dt = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
real dE_dt = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
real dG_dt = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
real dP_dt = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
real dI_dt = alpha * E / tau - I / Ts;
real dA_dt = (1 - alpha) * E / tau - A / Ta;
real dB_dt = G / tau - B / Tm;
real dC_dt = P / tau - C / Ti;

return {dS_dt, dE_dt, dG_dt, dP_dt, dI_dt, dA_dt, dB_dt, dC_dt};
}

}

data {
int<lower=1> n_days;
real y0;
real t0;
real ts[n_days];
int N;
int cases[n_days];

}

transformed data {
real x_r;
int x_i = { N };
}

parameters {
real<lower = 0, upper = 1> phi;
real<lower = 0, upper = 1> pe;
real<lower = 0, upper = 1> pie;
real<lower = 0, upper = 1> pee;
real<lower = 0> beta;
real<lower = 0> mu;
real<lower = 0>betam;
real<lower = 0> tau;
real<lower = 0> alpha;
real<lower = 0> Ts;
real<lower = 0> Ta;
real<lower = 0> Tm;
real<lower = 0> Ti;
}

transformed parameters{
real y_hat[n_days, 8];
real<lower=0, upper=1> theta1;
real<lower = 0> theta2;

y_hat = integrate_ode_rk45(sir, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
for(i in 1:n_days){
for(j in 1:8){
y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
}
}

}

model {
phi ~ normal(0, 1);  //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee

beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1);  //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti

cases ~ binomial(col(to_matrix(y_hat, 5))); //y_hat[,5] are the fractions infected from the ODE solver

}

generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));

}
``````

But I can not run this again, as it says:

`Error in close.connection(zz) : cannot close 'message' sink connection`

when compiling the stan file. My R code is here along with other files. I wonder what am I doing wrong? any help is much appreciated. Could this be a problem at model block? I am defining the likelihood as binomial for infections which is the 5th ODE.

I also wonder why am I getting error as above only? while others seem to get pinpointed on where the error is happening?

mn.stan (3.0 KB) rcode1.R (982 Bytes) xyzz.csv (1.8 KB)

There seems to be a syntax error on this line:

``````cases ~ binomial(col(to_matrix(y_hat, 5)));
``````

I changed the likelihood to a normal (just as a placeholder to get things to compile), and then I got these errors when I ran the model:

`````` "Error in sampler\$call_sampler(args_list[[i]]) : "
 "  Exception: Max number of iterations exceeded (1000000).  (in 'modelda7739d7107_mn' at line 87)"
error occurred during calling the sampler; sampling not done
here are whatever error messages were returned
``````

Which indicate that the sampler wasn’t able to initialize itself cause the ODE solver failed. Are you getting these errors as well?

Thank you @bbbales2. So I guess I know where the error could happen. Let me go through it step by step. First unfortunately the only error I get is the
`Error in close.connection(zz) : cannot close 'message' sink connection`
and I assume this is because I have Catalina and R version 3.6.3 and when I try to update this to R version 4 I encounter many other issues, to begin on the updated version I can not get rstan running. This is why unfortunately I am using the older version.

Having said that if you look at the link you’ve shared the way they define the likelihood is as follow:

``````cases ~ neg_binomial_2(col(to_matrix(y), 2), phi);
``````

this is negative binomial. Yet what’s important there is a parameter `phi` involved which I assume it takes the noise into account. I didn’t have that and I am not sure if I must have this in defining likelihood.

Secondly I think this is equivalent of my first stan code which got corrected by @maxbiostat, in there we have:

``````y ~ binomial(n_sample, y_hat[,5]);
``````

again there is a parameter `n_sample` which was defined in `data` block.

In the first version of stan code corrected by @maxbiostat I was running:

``````test = stan("SI_fit.stan",
data = stan_d,
pars = params_monitor,
chains = 1, iter = 10)

test = stan("SI_fit.stan",
data = stan_d,
pars = params_monitor,
chains = 1, iter = 10)

mod = stan(fit = test,
data = stan_d,
pars = params_monitor,
chains = 3,
warmup = 500,
iter = 1000,
``````

on R and I was getting the error:

``````no parameter params; sampling not done
Stan model 'SI_fit' does not contain samples.
``````

So I guess the issue boils down to `n_sample` or adding a parameter for the noise. I am not sure how to do this right, any suggestions on this would be kind? Note that in above I defined `n_sample` as the UK population size and it did not work

Also note that there is a typo, following the link you’ve shared it should be:

``````cases ~ binomial(col(to_matrix(y_hat), 5));
``````

Oof, yeah, I think this is worth a separate thread. I do not know the Catalina/R issues (Linux here).

Usually the question with binary data is is your data the outcomes of N trials? Or is it the count of something occurring at some rate?

That sorta leads the discussion of what likelihood we’ll pick. The first we’ll usually go bernoulli/binomial. The second looks like a poisson, but usually when you have a poisson you can do better by just fitting a negative binomial (because reality wasn’t actually exactly poisson or whatever).

Thank you @bbbales2 for your reply. So my data is a vector as follow:

``````0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 5, 0, 1, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 2, 5, 3, 13, 4, 11, 34, 30, \
48, 43, 67, 48, 61, 74, 0, 342, 342, 0, 403, 407, 676, 63, 1294, \
1035, 665, 967, 1427, 1452, 2129, 2885, 2546, 2433, 2619, 3009, 4324, \
4244, 4450, 3735, 5903, 3802, 3634, 5491, 4344, 8681, 5233, 5288, \
4342, 5252, 4603, 4617, 5599, 5525, 5850, 4676, 4301, 4451, 4583, \
5386, 4913, 4463, 4309, 3996, 4076, 6032, 6201, 4806, 4339, 3985, \
4406, 6111, 5614, 4649, 3896, 3923, 3877, 3403, 3242, 3446, 3560, 3450
``````

each entry is daily number of infected people to Covid-19. So there is no trial per day as each day has one number. This would be bell shaped which can be described by binomial or poission indeed. So I imagine one would not need `n_sample` is that so?

and to make it complete, this data is the number of infections as I mentioned and so the 5th ODE is the one we need to fit them into.

The bell shape here is on the time axis.

The likelihood is for describing each outcome. So you have a model for the number of underlying infections (I[t]) and you have you measurements (y[t]).

Your outcomes aren’t trials, so you might say that you expect your measurements to be Poisson distributed about I[t]:

``````y[t] ~ poisson(I[t]);
``````

Usually this model doesn’t fit super well. Like, if you take your data, fit it, and then check the fit vs. the data like in the case study it won’t be great.

So then move to a negative binomial. In that case you also need to fit an overdispersion parameter. That parameter might seem a bit weird, so just generate data in R to get a handle on what it does:

Here’s the R docs and the stan docs.

I think `rnbinom(n, size = phi, mu = mu)` is the parameterization you want to convert from Stan parameters (`mu` and `phi`) to R parameters (`mu` and `size`). Might want to double check that though.

Have a look at the references in the case study and also the references Riou 2020 and Flaxman 2020. Look at how they do things. As I understand it there’s a lot going on in covid-19 data analysis that goes beyond picking a likelihood and an SEIR model – but you can probably get a better sense of what’s going on by looking at those papers and seeing how they do this.

1 Like

Thank you @bbbales2 for your guidance. My issue at the moment is that I can not get the stan run for basic binomial. Let me go through it part by part so maybe you or someone could help me:

First of all I have my function, there are 8 ODEs and the 5th one is the infection which I want to fit my data into. For this I have:

``````functions {

real[] SI(real t,
real[] y,
real[] params,
real[] x_r,
int[] x_i) {

real dydt;

// Variables of the ODEs

real S = y;
real E = y;
real G = y;
real P = y;
real I = y;
real A = y;
real B = y;
real C = y;

// The parameters of the ODEs
real phi = params;
real pe = params;
real pie = params;
real pee = params;

real beta = params;
real mu = params;
real betam = params;
real tau = params;
real alpha = params;
real Ts = params;
real Ta = params;
real Tm = params;
real Ti = params;

dydt = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
dydt = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
dydt = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
dydt = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
dydt = alpha * E / tau - I / Ts;
dydt = (1 - alpha) * E / tau - A / Ta;
dydt = G / tau - B / Tm;
dydt = P / tau - C / Ti;
return dydt;
}
``````

In this block I have defined 13 parameters of the ODEs with first 4 being probabilities. Then the data block is:

``````data {
int<lower = 1> n_obs; // Number of days sampled
int<lower = 1> n_params; // Number of model parameters
int<lower = 1> n_difeq; // Number of differential equations in the system
int y[n_obs]; // The binomially distributed data
real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled
}
``````

I have labelled them so one can understand what is what easily. My transformed data is the usual:

``````transformed data {
real x_r;
int x_i;
}
``````

then I have my parameters:

``````parameters {
real<lower = 0, upper = 1> phi;
real<lower = 0, upper = 1> pe;
real<lower = 0, upper = 1> pie;
real<lower = 0, upper = 1> pee;
real<lower = 0> beta;
real<lower = 0> mu;
real<lower = 0>betam;
real<lower = 0> tau;
real<lower = 0> alpha;
real<lower = 0> Ts;
real<lower = 0> Ta;
real<lower = 0> Tm;
real<lower = 0> Ti;
real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}
``````

In this block you can see that `S0` is between 0 and 1, this is because I am assuming that total population is equal to 1. So at the end my data plot would show the “portion of people who are infected”. I then have my transformed data :

``````transformed parameters{
real y_hat[n_obs, n_difeq]; // Output from the ODE solver
real y0[n_difeq]; // Initial conditions
real<lower=0, upper=1> params1;
real<lower = 0> params2;

y0 = 1 - 0.02 - 0.001;
y0 = 0.02;
y0 = 0;
y0 = 0;
y0 = 0.001;
y0 = 0;
y0 = 0;
y0 = 0;
y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
}
}

}
``````

As you can see from the initial conditions they are all between 0 and 1 and susceptible individuals are about 0.98 of the population (which is 1). Another note is that I have `real<lower=0, upper=1> params1;` for the probabilities and `real<lower = 0> params2; ` for the rest of the parameters. The integrator then keeps track of list of the parameters (this is a neat way suggested by @maxbiostat ).

My model finally is:

``````model {
phi ~ normal(0, 1);  //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee

beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1);  //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti
S0 ~ normal(0.5, 0.5); //constrained to be 0-1.

y ~ binomial(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver

}
``````

As you can see I defined `y ~ binomial(y_hat[,5])` where `y_hat[,5]` are the infected solution of the ODEs. As you mentioned this might not be accurate in the fit, but I am just first trying to get it run, hence I am sharing my method with you.

Finally, on the stan side, I have the following generated quantity:

``````generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
``````

On the R side I have the list of the variables needed to run the stan:

``````stan_d = list(n_obs = length(datalist\$cases),
n_params = 13,
n_difeq = 8,
y = datalist\$cases/66500000,
t0 = 0,
ts = sample_time)
``````

where `n_obs` is the number of dates or in my case the length of the cases. `n_params` is the number of parameters of the ODEs, ` n_difeq` is the number of ODEs, `y` is the number of infected normalised by population so they would be between 0 and 1. and finally `ts` is the following:

``````sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
``````

I then run the following and test would fail,

``````# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")

# Test / debug the model:
test = stan("SI_fit.stan",
data = stan_d,
pars = params_monitor,
chains = 1, iter = 10)
``````

I am running this code on Rstudio and my full R code looks as follow (this includes the stan part):

``````# importing the libraries
library(tidyverse)
library(deSolve)
library(dplyr)
library(ggplot2)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# plotting the data
ggplot(data = datalist) +
geom_point(mapping = aes(x = date, y = cases)) +
labs(y = "Number of daily cases")

sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)

# The Stan model statement:
cat(
'
functions {

real[] SI(real t,
real[] y,
real[] params,
real[] x_r,
int[] x_i) {

real dydt;

// Variables of the ODEs

real S = y;
real E = y;
real G = y;
real P = y;
real I = y;
real A = y;
real B = y;
real C = y;

// The parameters of the ODEs
real phi = params;
real pe = params;
real pie = params;
real pee = params;

real beta = params;
real mu = params;
real betam = params;
real tau = params;
real alpha = params;
real Ts = params;
real Ta = params;
real Tm = params;
real Ti = params;

dydt = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
dydt = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
dydt = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
dydt = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
dydt = alpha * E / tau - I / Ts;
dydt = (1 - alpha) * E / tau - A / Ta;
dydt = G / tau - B / Tm;
dydt = P / tau - C / Ti;

return dydt;
}

}

data {
int<lower = 1> n_obs; // Number of days sampled
int<lower = 1> n_params; // Number of model parameters
int<lower = 1> n_difeq; // Number of differential equations in the system
int y[n_obs]; // The binomially distributed data
real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled

}

transformed data {
real x_r;
int x_i;
}

parameters {
real<lower = 0, upper = 1> phi;
real<lower = 0, upper = 1> pe;
real<lower = 0, upper = 1> pie;
real<lower = 0, upper = 1> pee;
real<lower = 0> beta;
real<lower = 0> mu;
real<lower = 0>betam;
real<lower = 0> tau;
real<lower = 0> alpha;
real<lower = 0> Ts;
real<lower = 0> Ta;
real<lower = 0> Tm;
real<lower = 0> Ti;
real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}

transformed parameters{
real y_hat[n_obs, n_difeq]; // Output from the ODE solver
real y0[n_difeq]; // Initial conditions
real<lower=0, upper=1> params1;
real<lower = 0> params2;

y0 = 1 - 0.02 - 0.001;
y0 = 0.02;
y0 = 0;
y0 = 0;
y0 = 0.001;
y0 = 0;
y0 = 0;
y0 = 0;

y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
//for(i in 1:n_obs){
//for(j in 1:n_difeq){
//y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
//}
// }

}

model {
phi ~ normal(0, 1);  //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee

beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1);  //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti
S0 ~ normal(0.5, 0.5); //constrained to be 0-1.

y ~ binomial(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver

}

generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));

}

',
file = "SI_fit.stan", sep="", fill=T)

# FITTING

# For stan model we need the following variables:

stan_d = list(n_obs = length(datalist\$cases),
n_params = 13,
n_difeq = 8,
y = datalist\$cases/66500000,
t0 = 0,
ts = sample_time)

# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")

# Test / debug the model:
test = stan("SI_fit.stan",
data = stan_d,
pars = params_monitor,
chains = 1, iter = 10)

# Fit and sample from the posterior
mod = stan(fit = test,
data = stan_d,
pars = params_monitor,
chains = 3,
warmup = 500,
iter = 1000,

# Extract the posterior samples to a structured list:
posts <- extract(mod)

``````

I am adding the data file that I am using as well in here:
xyzz.csv (1.8 KB)
this is used in `datalist <- read.csv(file = 'xyzz.csv')`. Through the discussion that we had and as much as I could perfectly look into other models I think my stan code looks ok but I could have missed something with my newbie eyes or I am not sure if I am making any mistake on the R side where I define the `stan_d = list()` which are the required parameters for stan. It’s also noteworthy that I am using this reference for my code and indeed I can run the code perfectly fine from the link but I can not run mine unfortunately. I’ll be grateful if I eventually can understand where is my mistake happening.

1 Like

With the model you wrote, `poisson` is probably the simplest likelihood, so:

``````y ~ poisson(y_hat[,5]);
``````

Would be the thing to try.

If you really want to use binomial, you’re missing an argument, and the units of `y_hat[, 5]` (number of infections), don’t make sense.

A binomial takes two parameters. `n` number of trials and `p` probability of successes, and it describes the distribution of the number of successes in those `n` trials.

1 Like

Thank you @bbbales2. So, for the time being, I changed the likelihood to Poisson and when I run it I get the following error:

``````Error in new_CppObject_xp(fields\$.module, fields\$.pointer, ...) :
Exception: int variable contained non-int values; processing stage=data initialization; variable name=y; base type=int  (in 'model70544a99286_SI_fit' at line 62)
[origin: unknown original type]
failed to create the sampler; sampling not done
Error in new_CppObject_xp(fields\$.module, fields\$.pointer, ...) :
Exception: int variable contained non-int values; processing stage=data initialization; variable name=y; base type=int  (in 'model70544a99286_SI_fit' at line 62)
[origin: unknown original type]
failed to create the sampler; sampling not done
Stan model 'SI_fit' does not contain samples.
``````

I checked this and I came across this post, I wonder if this error happens due to the way I defined t0 and ts in data block:

``````  real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled
``````

or the way I defined `sample_time` namely:

``````sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
``````

which I used for running the stan code in:

``````stan_d = list(n_obs = length(datalist\$cases),
n_params = 13,
n_difeq = 8,
y = datalist\$cases/66500000,
t0 = 0,
ts = sample_time)
``````

So the problem is with `y`. It is declared as int in the model, but apparently some of the input values are not ints.

1 Like

Indeed so in data block I have rightly so:

``````int y[n_obs];
``````

and in

``````stan_d = list(n_obs = length(datalist\$cases),
n_params = 13,
n_difeq = 8,
y = datalist\$cases/66500000,
t0 = 0,
ts = sample_time)
``````

I have checked my y using:

``````typeof(datalist\$cases/66500000)
``````

for which I get `double` which is double precision. Indeed division by 66.5 million makes the numbers very small and close to 0 so when I do `as.integer(datalist\$cases/66500000)` I have list of 0s. Is there a way to overcome this on stan side?