Thank you @bbbales2 for your guidance. My issue at the moment is that I can not get the stan run for basic binomial. Let me go through it part by part so maybe you or someone could help me:
First of all I have my function, there are 8 ODEs and the 5th one is the infection which I want to fit my data into. For this I have:
functions {
real[] SI(real t,
real[] y,
real[] params,
real[] x_r,
int[] x_i) {
real dydt[8];
// Variables of the ODEs
real S = y[1];
real E = y[2];
real G = y[3];
real P = y[4];
real I = y[5];
real A = y[6];
real B = y[7];
real C = y[8];
// The parameters of the ODEs
real phi = params[1];
real pe = params[2];
real pie = params[3];
real pee = params[4];
real beta = params[5];
real mu = params[6];
real betam = params[7];
real tau = params[8];
real alpha = params[9];
real Ts = params[10];
real Ta = params[11];
real Tm = params[12];
real Ti = params[13];
dydt[1] = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
dydt[2] = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
dydt[3] = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
dydt[4] = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
dydt[5] = alpha * E / tau - I / Ts;
dydt[6] = (1 - alpha) * E / tau - A / Ta;
dydt[7] = G / tau - B / Tm;
dydt[8] = P / tau - C / Ti;
return dydt;
}
In this block I have defined 13 parameters of the ODEs with first 4 being probabilities. Then the data block is:
data {
int<lower = 1> n_obs; // Number of days sampled
int<lower = 1> n_params; // Number of model parameters
int<lower = 1> n_difeq; // Number of differential equations in the system
int y[n_obs]; // The binomially distributed data
real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled
}
I have labelled them so one can understand what is what easily. My transformed data is the usual:
transformed data {
real x_r[0];
int x_i[0];
}
then I have my parameters:
parameters {
real<lower = 0, upper = 1> phi;
real<lower = 0, upper = 1> pe;
real<lower = 0, upper = 1> pie;
real<lower = 0, upper = 1> pee;
real<lower = 0> beta;
real<lower = 0> mu;
real<lower = 0>betam;
real<lower = 0> tau;
real<lower = 0> alpha;
real<lower = 0> Ts;
real<lower = 0> Ta;
real<lower = 0> Tm;
real<lower = 0> Ti;
real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}
In this block you can see that S0
is between 0 and 1, this is because I am assuming that total population is equal to 1. So at the end my data plot would show the “portion of people who are infected”. I then have my transformed data :
transformed parameters{
real y_hat[n_obs, n_difeq]; // Output from the ODE solver
real y0[n_difeq]; // Initial conditions
real<lower=0, upper=1> params1[4];
real<lower = 0> params2[9];
y0[1] = 1 - 0.02 - 0.001;
y0[2] = 0.02;
y0[3] = 0;
y0[4] = 0;
y0[5] = 0.001;
y0[6] = 0;
y0[7] = 0;
y0[8] = 0;
y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
}
}
}
As you can see from the initial conditions they are all between 0 and 1 and susceptible individuals are about 0.98 of the population (which is 1). Another note is that I have real<lower=0, upper=1> params1[4];
for the probabilities and real<lower = 0> params2[9];
for the rest of the parameters. The integrator then keeps track of list of the parameters (this is a neat way suggested by @maxbiostat ).
My model finally is:
model {
phi ~ normal(0, 1); //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee
beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1); //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti
S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
y ~ binomial(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver
}
As you can see I defined y ~ binomial(y_hat[,5])
where y_hat[,5]
are the infected solution of the ODEs. As you mentioned this might not be accurate in the fit, but I am just first trying to get it run, hence I am sharing my method with you.
Finally, on the stan side, I have the following generated quantity:
generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
On the R side I have the list of the variables needed to run the stan:
stan_d = list(n_obs = length(datalist$cases),
n_params = 13,
n_difeq = 8,
y = datalist$cases/66500000,
t0 = 0,
ts = sample_time)
where n_obs
is the number of dates or in my case the length of the cases. n_params
is the number of parameters of the ODEs, n_difeq
is the number of ODEs, y
is the number of infected normalised by population so they would be between 0 and 1. and finally ts
is the following:
sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
I then run the following and test would fail,
# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")
# Test / debug the model:
test = stan("SI_fit.stan",
data = stan_d,
pars = params_monitor,
chains = 1, iter = 10)
I am running this code on Rstudio and my full R code looks as follow (this includes the stan part):
# importing the libraries
library(tidyverse)
library(deSolve)
library(dplyr)
library(ggplot2)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
datalist <- read.csv(file = 'xyzz.csv')
# plotting the data
ggplot(data = datalist) +
geom_point(mapping = aes(x = date, y = cases)) +
labs(y = "Number of daily cases")
sample_time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
# The Stan model statement:
cat(
'
functions {
real[] SI(real t,
real[] y,
real[] params,
real[] x_r,
int[] x_i) {
real dydt[8];
// Variables of the ODEs
real S = y[1];
real E = y[2];
real G = y[3];
real P = y[4];
real I = y[5];
real A = y[6];
real B = y[7];
real C = y[8];
// The parameters of the ODEs
real phi = params[1];
real pe = params[2];
real pie = params[3];
real pee = params[4];
real beta = params[5];
real mu = params[6];
real betam = params[7];
real tau = params[8];
real alpha = params[9];
real Ts = params[10];
real Ta = params[11];
real Tm = params[12];
real Ti = params[13];
dydt[1] = - mu * beta * S * A - beta * S * I - beta * C * S - betam * B * S;
dydt[2] = mu * beta * S * A + (1 - phi) * beta * S * I + (1 - phi) * beta * C * S + (1 - phi) * betam * B * S - E / tau;
dydt[3] = pe * phi * beta * S * I + pie * phi * beta * C * S + pee * phi * betam * B * S - G / tau;
dydt[4] = (1 - pe) * phi * beta * S * I + (1 - pie) * phi * beta * C * S + (1 - pee) * phi * betam * B * S - P / tau;
dydt[5] = alpha * E / tau - I / Ts;
dydt[6] = (1 - alpha) * E / tau - A / Ta;
dydt[7] = G / tau - B / Tm;
dydt[8] = P / tau - C / Ti;
return dydt;
}
}
data {
int<lower = 1> n_obs; // Number of days sampled
int<lower = 1> n_params; // Number of model parameters
int<lower = 1> n_difeq; // Number of differential equations in the system
int y[n_obs]; // The binomially distributed data
real t0; // Initial time point (zero)
real ts[n_obs]; // Time points that were sampled
}
transformed data {
real x_r[0];
int x_i[0];
}
parameters {
real<lower = 0, upper = 1> phi;
real<lower = 0, upper = 1> pe;
real<lower = 0, upper = 1> pie;
real<lower = 0, upper = 1> pee;
real<lower = 0> beta;
real<lower = 0> mu;
real<lower = 0>betam;
real<lower = 0> tau;
real<lower = 0> alpha;
real<lower = 0> Ts;
real<lower = 0> Ta;
real<lower = 0> Tm;
real<lower = 0> Ti;
real<lower = 0, upper = 1> S0; // Initial fraction of susceptible
}
transformed parameters{
real y_hat[n_obs, n_difeq]; // Output from the ODE solver
real y0[n_difeq]; // Initial conditions
real<lower=0, upper=1> params1[4];
real<lower = 0> params2[9];
y0[1] = 1 - 0.02 - 0.001;
y0[2] = 0.02;
y0[3] = 0;
y0[4] = 0;
y0[5] = 0.001;
y0[6] = 0;
y0[7] = 0;
y0[8] = 0;
y_hat = integrate_ode_rk45(SI, y0, t0, ts, {phi, pe, pie, pee, beta, mu, betam, tau, alpha, Ts, Ta, Tm, Ti}, x_r, x_i);
//for(i in 1:n_obs){
//for(j in 1:n_difeq){
//y_hat[i, j] = 1e-6 + (1-2e-6)*y_hat[i, j];
//}
// }
}
model {
phi ~ normal(0, 1); //phi
pe ~ normal(0, 1); //pe
pie ~ normal(0, 1); //pie
pee ~ normal(0, 1); //pee
beta ~ normal(0.5, 0.9); //beta
mu ~ normal(0.2, 0.55); //mu
betam ~ normal(0, 1); //beta_m
tau ~ normal(1, 20); //tau
alpha ~ normal(0.1, 0.99); //alpha
Ts ~ normal(3, 20); //Ts
Ta ~ normal(1, 20); //Ta
Tm ~ normal(1, 20); //Tm
Ti ~ normal(0.5, 8); //Ti
S0 ~ normal(0.5, 0.5); //constrained to be 0-1.
y ~ binomial(y_hat[,5]); //y_hat[,5] are the fractions infected from the ODE solver
}
generated quantities {
real R_0;
R_0 = beta * Ta * (alpha * Ts * (1 - phi) / Ta + mu * (1 - alpha));
}
',
file = "SI_fit.stan", sep="", fill=T)
# FITTING
# For stan model we need the following variables:
stan_d = list(n_obs = length(datalist$cases),
n_params = 13,
n_difeq = 8,
y = datalist$cases/66500000,
t0 = 0,
ts = sample_time)
# Which parameters to monitor in the model:
params_monitor = c("y_hat", "y0", "params", "R_0")
# Test / debug the model:
test = stan("SI_fit.stan",
data = stan_d,
pars = params_monitor,
chains = 1, iter = 10)
# Fit and sample from the posterior
mod = stan(fit = test,
data = stan_d,
pars = params_monitor,
chains = 3,
warmup = 500,
iter = 1000,
control = list(adapt_delta = 0.99))
# Extract the posterior samples to a structured list:
posts <- extract(mod)
I am adding the data file that I am using as well in here:
xyzz.csv (1.8 KB)
this is used in datalist <- read.csv(file = 'xyzz.csv')
. Through the discussion that we had and as much as I could perfectly look into other models I think my stan code looks ok but I could have missed something with my newbie eyes or I am not sure if I am making any mistake on the R side where I define the stan_d = list()
which are the required parameters for stan. It’s also noteworthy that I am using this reference for my code and indeed I can run the code perfectly fine from the link but I can not run mine unfortunately. I’ll be grateful if I eventually can understand where is my mistake happening.