Hello,
I am trying to perform parameter estimation using simulated data based on an SIB (Susceptible, Infected, Bacteria) model used to describe the spread of cholera. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC29087/
As far as I can tell, the model used to simulate the data is described in exactly the same way as in the STAN model. However, when I run the model, the parameters are way off those used in the data simulation process. What is also interesting, is that if I take the parameters estimated by the STAN model, and input them into my data simulation model - the SIB plots produced are completely different (see post analysis section). I feel this implies some discrepancies in the two models - but I can’t spot the difference!
import pystan
import numpy as np
import time
import matplotlib.pyplot as plt
import pandas as pd
######################### STAN Model ########################################
model = """
functions{
real f(real B) {
real force;
force = B/(B+1000000);
return force;
}
real[] dz_dt(real t, //time
real[] z, // state
real[] theta, // parameters
real[] x_r, // data (real)
int[] x_i) { // data (integer)
real S = z[1]; //Unpack state variables
real I = z[2];
real B = z[3];
real N = 10000; //Known parameters
real mu = 0.0001;
real net_b = -0.33;
real r = 0.2;
real a = theta[1]; //Unknown parameters
real e = theta[2];
real dS_dt = mu*(N-S) - a*f(B)*S;
real dI_dt = a*f(B) - r*I;
real dB_dt = B*(net_b) + e*I;
return {dS_dt, dI_dt, dB_dt};
}
}
data{
int<lower=0> M; //Number of measurements
real<lower = 0> ts[M]; //Measurement times > 0
real<lower = 0> y_init[3]; //Measured initial conditions
real<lower = 0> z_init[3]; //True initial conditions
real<lower = 0> y[M,3]; //Data
}
parameters{
real<lower=0> theta[2]; //theta = [a,e,net_b,r]
real<lower=0> sigma; //Error scale
}
transformed parameters{
real z[M,3]
= integrate_ode_rk45(dz_dt, z_init, 0.0, ts, theta,
rep_array(0.0,0), rep_array(0,0),
1e-6, 1e-5, 1e3);
}
model{
//Priors
theta[1] ~ gamma(3,0.6); //a
theta[2] ~ gamma(10,1); //e
for (k in 1:3) {
y_init ~ normal(z_init[k], sigma);
y[,k] ~ normal(z[,k], sigma);
}
}
"""
sm = pystan.StanModel(model_code = model)
######################## Simulated data ##############################
N = 10000 #Population size
mu = 0.0001 #Birth/death rate
a = 1 #Exposure rate
K = 10**6 #Critical bacterial concentration
r = 0.2 #Recovery rate
net_b = -0.33 #Net bacterial death
e = 10 #Bacterial shedding rate (per person per day)
init_inf = 10 #Initial number of infected individuals
sigma = 50 #Noise levels
theta = [a,e] #Assumed unknown parameters
num_days = 500
M = num_days -1
t = np.arange(1,num_days)
y_init = [N, init_inf, 0] #Initial conditions (measured)
def f(B):
"Force of infection as a function of bacterial concentration"
f = B/(10**6+B)
return f
def sir_smooth(theta, num_days, init_inf, N):
"Deterministic SIB simulation"
#Unpack variables
a = theta[0]
e = theta[1]
net_b = -0.33
r = 0.2
S,I,B= np.zeros(num_days+1), np.zeros(num_days+1), np.zeros(num_days+1) #Initialize arrays
S[0] = N
I[0] = init_inf
for t in range(num_days):
B[t+1] = B[t] + B[t]*net_b + e*I[t]
S[t+1] = S[t] + mu*(N-S[t]) - a*f(B[t])*S[t]
I[t+1] = I[t] + a*f(B[t])*S[t] - r * I[t]
sim = np.array([S[1:num_days+1],I[1:num_days+1],B[1:num_days+1]])
return (sim)
def sir_noise(theta,sigma, num_days, init_inf, N):
"Simulates a noisy SIR run of length 'size'; parameters, R0 and D_inf; noise, sigma and initial conditions, y0, as a proportion of total population"
S_noise = np.zeros(num_days)
I_noise = np.zeros(num_days)
B_noise = np.zeros(num_days)
#Calculates theoretical model without noise
smooth_model = sir_smooth(theta, num_days, init_inf, N)
S_sm,I_sm,B_sm = smooth_model[0], smooth_model[1], smooth_model[2]
#Adds noise
for t in range(num_days):
S_noise[t] = max(0.0, S_sm[t]+np.random.normal(0,sigma))
I_noise[t] = max(0.0, I_sm[t]+np.random.normal(0,sigma))
B_noise[t] = max(0.0, B_sm[t]+np.random.normal(0,sigma))
sim = np.array([S_noise,I_noise,B_noise])
return (sim)
data_smooth = sir_smooth(theta, num_days, init_inf, N)
data_noise = sir_noise(theta, sigma, num_days, init_inf, N)
data_tr = data_noise[:,1:num_days].transpose()
stan_data = {'M': M, 'ts':t, 'y_init': y_init, 'z_init': y_init, 'y':data_tr}
################################## Run MCMC #######################################
fit = sm.sampling(data = stan_data, chains = 2, iter = 1000, n_jobs=1)
##################################### Post Analysis ##################################
summary_dict = fit.summary()
df = pd.DataFrame(summary_dict['summary'],
columns=summary_dict['summary_colnames'],
index=summary_dict['summary_rownames'])
a_mean, e_mean,sigma_mean = df['mean']['theta[1]'], df['mean']['theta[2]'],df['mean']['sigma']
print("a: Mean =", round(a_mean,2), ' N_eff=', round(df['n_eff']['theta[1]'],1),
' Rhat=',round(df['Rhat']['theta[1]'],4))
print("e: Mean =", round(e_mean,2), ' N_eff=', round(df['n_eff']['theta[2]'],1),
' Rhat=',round(df['Rhat']['theta[2]'],4))
print("Sigma: Mean =", round(sigma_mean,2), ' N_eff=', round(df['n_eff']['sigma'],1),
' Rhat=',round(df['Rhat']['sigma'],4))
stan_theta = [a_mean, e_mean]
stan_sim = sir_smooth(stan_theta, num_days, init_inf, N)
stan_S = pd.concat([pd.Series(y_init[0]),df['mean']['z[1,1]':'z[499,1]']])
stan_I = pd.concat([pd.Series(y_init[1]),df['mean']['z[1,2]':'z[499,2]']])
stan_B = pd.concat([pd.Series(y_init[2]),df['mean']['z[1,3]':'z[499,3]']])
#Extract trace
a_trace = fit['theta[1]']
e_trace = fit['theta[2]']
sigma_trace = fit['sigma']
lp_trace = fit['lp__']
fig, (ax1, ax2, ax3) = plt.subplots(3,1, dpi = 300)
fig.suptitle('Trace plots', fontsize = 10)
fig.tight_layout(pad = 2)
ax1.plot(a_trace, linewidth = 0.5)
ax1.set_title('a', fontsize = 10)
ax2.plot(e_trace, linewidth = 0.5)
ax2.set_title('e', fontsize = 10)
ax3.plot(sigma_trace, linewidth = 0.5)
ax3.set_title('Sigma', fontsize = 10)
########################Visualise Results###################################
tplus = np.arange(0,num_days)
fig, ax1 = plt.subplots(1,1, dpi = 300)
#Plot Simulated Data
ax1.plot(tplus, data_noise[0], label='S simulated', color = 'r')
ax1.plot(tplus, data_noise[1], label='I simulated', c = 'g')
ax1.plot(tplus, data_noise[2], label = 'B simulated', c = 'b')
ax1.set_ylim(0,10000)
ax1.plot(tplus, stan_S, label = 'S STAN modelled', c = 'lightcoral')
ax1.plot(tplus, stan_I, label = 'I STAN modelled', c = 'lightgreen')
ax1.plot(tplus, stan_B, label = 'B STAN modelled', c = 'lightblue')
ax1.plot(tplus, stan_sim[0], label = 'S STAN simulated', c = 'darkred', ls = '--')
ax1.plot(tplus, stan_sim[1], label = 'I STAN simulated', c = 'darkgreen', ls = '--')
ax1.plot(tplus, stan_sim[2], label = 'B STAN simulated', c = 'darkblue', ls = '--')
# ax1.set_title('Simulated Data')
ax1.set(xlabel = 'time', ylabel = 'Population')
ax1.legend()
Please see also, a section of the traceback:
(stan_env) C:>python C:\Users\dms228\codeco.py
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_3f726eb1c89580ddcfbd473502a7442d NOW.
Gradient evaluation took 0.003 seconds
1000 transitions using 10 leapfrog steps per transition would take 30 seconds.
Adjust your expectations accordingly!
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000). (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000). (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000). (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Iteration: 1 / 1000 [ 0%] (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Location parameter[1] is nan, but must be finite! (in 'unknown file name' at line 61)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Iteration: 100 / 1000 [ 10%] (Warmup)
Iteration: 200 / 1000 [ 20%] (Warmup)
Iteration: 300 / 1000 [ 30%] (Warmup)
Iteration: 400 / 1000 [ 40%] (Warmup)
Iteration: 500 / 1000 [ 50%] (Warmup)
Iteration: 501 / 1000 [ 50%] (Sampling)
Iteration: 600 / 1000 [ 60%] (Sampling)
Iteration: 700 / 1000 [ 70%] (Sampling)
Iteration: 800 / 1000 [ 80%] (Sampling)
Iteration: 900 / 1000 [ 90%] (Sampling)
Iteration: 1000 / 1000 [100%] (Sampling)
Elapsed Time: 10.18 seconds (Warm-up)
10.42 seconds (Sampling)
20.6 seconds (Total)
Gradient evaluation took 0.001 seconds
1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
Adjust your expectations accordingly!
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000). (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000). (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000). (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Iteration: 1 / 1000 [ 0%] (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! (in 'unknown file name' at line 49)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Iteration: 100 / 1000 [ 10%] (Warmup)
Iteration: 200 / 1000 [ 20%] (Warmup)
Iteration: 300 / 1000 [ 30%] (Warmup)
Iteration: 400 / 1000 [ 40%] (Warmup)
Iteration: 500 / 1000 [ 50%] (Warmup)
Iteration: 501 / 1000 [ 50%] (Sampling)
Iteration: 600 / 1000 [ 60%] (Sampling)
Iteration: 700 / 1000 [ 70%] (Sampling)
Iteration: 800 / 1000 [ 80%] (Sampling)
Iteration: 900 / 1000 [ 90%] (Sampling)
Iteration: 1000 / 1000 [100%] (Sampling)
Elapsed Time: 8.814 seconds (Warm-up)
8.247 seconds (Sampling)
17.061 seconds (Total)
WARNING:pystan:Maximum (flat) parameter count (1000) exceeded: skipping diagnostic tests for n_eff and Rhat.
To run all diagnostics call pystan.check_hmc_diagnostics(fit)
a: Mean = 76.61 N_eff= 310.3 Rhat= 0.9991
e: Mean = 52.39 N_eff= 360.1 Rhat= 1.0
Sigma: Mean = 2540.71 N_eff= 595.8 Rhat= 0.9993