Algorithm producing consistent, but inaccurate parameter estimation (integrate_ode)

Debbie · March 27, 2020, 2:10pm

Hello,

I am trying to perform parameter estimation using simulated data based on an SIB (Susceptible, Infected, Bacteria) model used to describe the spread of cholera. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC29087/

As far as I can tell, the model used to simulate the data is described in exactly the same way as in the STAN model. However, when I run the model, the parameters are way off those used in the data simulation process. What is also interesting, is that if I take the parameters estimated by the STAN model, and input them into my data simulation model - the SIB plots produced are completely different (see post analysis section). I feel this implies some discrepancies in the two models - but I can’t spot the difference!

import pystan
import numpy as np
import time
import matplotlib.pyplot as plt
import pandas as pd

######################### STAN Model ######################################## 

model = """
functions{
real f(real B) { 
    real force;
    force = B/(B+1000000);
    return force;
}

real[] dz_dt(real t,      //time
           real[] z,      // state
           real[] theta,  // parameters
           real[] x_r,    // data (real)
           int[] x_i) {   // data (integer)
             
    real S = z[1];        //Unpack state variables
    real I = z[2];
    real B = z[3];
    
    real N = 10000;       //Known parameters
    real mu = 0.0001;   
    real net_b = -0.33;    
    real r = 0.2;    
    real a = theta[1];    //Unknown parameters
    real e = theta[2];



    real dS_dt = mu*(N-S) - a*f(B)*S;
    real dI_dt = a*f(B) - r*I;
    real dB_dt = B*(net_b) + e*I;
    return {dS_dt, dI_dt, dB_dt};
}
}

data{
     int<lower=0> M;             //Number of measurements
     real<lower = 0> ts[M];      //Measurement times > 0
     real<lower = 0> y_init[3];  //Measured initial conditions
     real<lower = 0> z_init[3];  //True initial conditions
     real<lower = 0> y[M,3];     //Data
}

parameters{
    real<lower=0> theta[2];     //theta = [a,e,net_b,r]
    real<lower=0> sigma;        //Error scale
}

transformed parameters{
    real z[M,3]
    = integrate_ode_rk45(dz_dt, z_init, 0.0, ts, theta, 
                         rep_array(0.0,0), rep_array(0,0),
                         1e-6, 1e-5, 1e3);
}

model{
//Priors
      theta[1] ~ gamma(3,0.6);        //a    
      theta[2] ~ gamma(10,1);        //e
      for (k in 1:3) {
              y_init ~ normal(z_init[k], sigma);
              y[,k] ~ normal(z[,k], sigma);
      }   
}
"""
sm = pystan.StanModel(model_code = model)

######################## Simulated data ##############################

N = 10000           #Population size
mu = 0.0001         #Birth/death rate
a = 1               #Exposure rate
K = 10**6           #Critical bacterial concentration
r = 0.2             #Recovery rate
net_b = -0.33  #Net bacterial death
e = 10              #Bacterial shedding rate (per person per day)
init_inf = 10       #Initial number of infected individuals

sigma = 50          #Noise levels

theta = [a,e]     #Assumed unknown parameters

num_days = 500
M = num_days -1
t = np.arange(1,num_days)

y_init = [N, init_inf, 0]       #Initial conditions (measured)

def f(B):
    "Force of infection as a function of bacterial concentration"
    f = B/(10**6+B)
    return f

def sir_smooth(theta, num_days, init_inf, N):
    "Deterministic SIB simulation"
    
    #Unpack variables
    a = theta[0]
    e = theta[1]
    net_b = -0.33
    r = 0.2

    S,I,B= np.zeros(num_days+1), np.zeros(num_days+1), np.zeros(num_days+1)     #Initialize arrays

    S[0] = N
    I[0] = init_inf
           
    for t in range(num_days):
        B[t+1] = B[t] + B[t]*net_b + e*I[t]    
        S[t+1] = S[t] + mu*(N-S[t]) - a*f(B[t])*S[t]
        I[t+1] = I[t] + a*f(B[t])*S[t] - r * I[t]
    
    sim = np.array([S[1:num_days+1],I[1:num_days+1],B[1:num_days+1]])
        
    return (sim)

def sir_noise(theta,sigma, num_days, init_inf, N):
    "Simulates a noisy SIR run of length 'size'; parameters, R0 and D_inf; noise, sigma and initial conditions, y0, as a proportion of total population"

    S_noise = np.zeros(num_days)
    I_noise = np.zeros(num_days)
    B_noise = np.zeros(num_days)

#Calculates theoretical model without noise
    smooth_model = sir_smooth(theta, num_days, init_inf, N)
    S_sm,I_sm,B_sm = smooth_model[0], smooth_model[1], smooth_model[2]
    
#Adds noise
    for t in range(num_days):
        S_noise[t] = max(0.0,  S_sm[t]+np.random.normal(0,sigma)) 
        I_noise[t] = max(0.0, I_sm[t]+np.random.normal(0,sigma))
        B_noise[t] = max(0.0, B_sm[t]+np.random.normal(0,sigma))
        
    sim = np.array([S_noise,I_noise,B_noise])
        
    return (sim)

data_smooth = sir_smooth(theta, num_days, init_inf, N)
data_noise = sir_noise(theta, sigma, num_days, init_inf, N)
data_tr = data_noise[:,1:num_days].transpose()


stan_data = {'M': M, 'ts':t, 'y_init': y_init, 'z_init': y_init, 'y':data_tr}

################################## Run MCMC #######################################
fit = sm.sampling(data = stan_data, chains = 2, iter = 1000, n_jobs=1)


##################################### Post Analysis ##################################


summary_dict = fit.summary()
df = pd.DataFrame(summary_dict['summary'], 
                  columns=summary_dict['summary_colnames'], 
                  index=summary_dict['summary_rownames'])

a_mean, e_mean,sigma_mean = df['mean']['theta[1]'], df['mean']['theta[2]'],df['mean']['sigma']

print("a:       Mean =", round(a_mean,2), '    N_eff=', round(df['n_eff']['theta[1]'],1), 
      '     Rhat=',round(df['Rhat']['theta[1]'],4))
print("e:       Mean =", round(e_mean,2), '   N_eff=', round(df['n_eff']['theta[2]'],1), 
      '     Rhat=',round(df['Rhat']['theta[2]'],4))


print("Sigma:   Mean =", round(sigma_mean,2), '  N_eff=', round(df['n_eff']['sigma'],1), 
      '     Rhat=',round(df['Rhat']['sigma'],4))

stan_theta = [a_mean, e_mean]

stan_sim = sir_smooth(stan_theta, num_days, init_inf, N)


stan_S = pd.concat([pd.Series(y_init[0]),df['mean']['z[1,1]':'z[499,1]']])
stan_I = pd.concat([pd.Series(y_init[1]),df['mean']['z[1,2]':'z[499,2]']])
stan_B = pd.concat([pd.Series(y_init[2]),df['mean']['z[1,3]':'z[499,3]']])

#Extract trace
a_trace = fit['theta[1]']
e_trace = fit['theta[2]']

sigma_trace = fit['sigma']
lp_trace = fit['lp__']

fig, (ax1, ax2, ax3) = plt.subplots(3,1, dpi = 300)
fig.suptitle('Trace plots', fontsize = 10)
fig.tight_layout(pad = 2)

ax1.plot(a_trace, linewidth = 0.5)
ax1.set_title('a', fontsize = 10)

ax2.plot(e_trace, linewidth = 0.5)
ax2.set_title('e', fontsize = 10)


ax3.plot(sigma_trace, linewidth = 0.5)
ax3.set_title('Sigma', fontsize = 10)


########################Visualise Results###################################
tplus = np.arange(0,num_days)
fig, ax1 = plt.subplots(1,1, dpi = 300)

#Plot Simulated Data
ax1.plot(tplus, data_noise[0], label='S simulated', color = 'r')
ax1.plot(tplus, data_noise[1], label='I simulated', c = 'g')
ax1.plot(tplus, data_noise[2], label = 'B simulated', c = 'b')
ax1.set_ylim(0,10000)
ax1.plot(tplus, stan_S, label = 'S STAN modelled', c = 'lightcoral')
ax1.plot(tplus, stan_I, label = 'I STAN modelled', c = 'lightgreen')
ax1.plot(tplus, stan_B, label = 'B STAN modelled', c = 'lightblue')
ax1.plot(tplus, stan_sim[0], label = 'S STAN simulated', c = 'darkred', ls = '--')
ax1.plot(tplus, stan_sim[1], label = 'I STAN simulated', c = 'darkgreen', ls = '--')
ax1.plot(tplus, stan_sim[2], label = 'B STAN simulated', c = 'darkblue', ls = '--')
# ax1.set_title('Simulated Data')
ax1.set(xlabel = 'time', ylabel = 'Population')
ax1.legend()

Please see also, a section of the traceback:
(stan_env) C:>python C:\Users\dms228\codeco.py
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_3f726eb1c89580ddcfbd473502a7442d NOW.

Gradient evaluation took 0.003 seconds
1000 transitions using 10 leapfrog steps per transition would take 30 seconds.
Adjust your expectations accordingly!


Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000).  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000).  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000).  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:   1 / 1000 [  0%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Location parameter[1] is nan, but must be finite!  (in 'unknown file name' at line 61)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration: 100 / 1000 [ 10%]  (Warmup)
Iteration: 200 / 1000 [ 20%]  (Warmup)
Iteration: 300 / 1000 [ 30%]  (Warmup)
Iteration: 400 / 1000 [ 40%]  (Warmup)
Iteration: 500 / 1000 [ 50%]  (Warmup)
Iteration: 501 / 1000 [ 50%]  (Sampling)
Iteration: 600 / 1000 [ 60%]  (Sampling)
Iteration: 700 / 1000 [ 70%]  (Sampling)
Iteration: 800 / 1000 [ 80%]  (Sampling)
Iteration: 900 / 1000 [ 90%]  (Sampling)
Iteration: 1000 / 1000 [100%]  (Sampling)

 Elapsed Time: 10.18 seconds (Warm-up)
               10.42 seconds (Sampling)
               20.6 seconds (Total)


Gradient evaluation took 0.001 seconds
1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
Adjust your expectations accordingly!


Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000).  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000).  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Max number of iterations exceeded (1000).  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:   1 / 1000 [  0%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite!  (in 'unknown file name' at line 49)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration: 100 / 1000 [ 10%]  (Warmup)
Iteration: 200 / 1000 [ 20%]  (Warmup)
Iteration: 300 / 1000 [ 30%]  (Warmup)
Iteration: 400 / 1000 [ 40%]  (Warmup)
Iteration: 500 / 1000 [ 50%]  (Warmup)
Iteration: 501 / 1000 [ 50%]  (Sampling)
Iteration: 600 / 1000 [ 60%]  (Sampling)
Iteration: 700 / 1000 [ 70%]  (Sampling)
Iteration: 800 / 1000 [ 80%]  (Sampling)
Iteration: 900 / 1000 [ 90%]  (Sampling)
Iteration: 1000 / 1000 [100%]  (Sampling)

 Elapsed Time: 8.814 seconds (Warm-up)
               8.247 seconds (Sampling)
               17.061 seconds (Total)

WARNING:pystan:Maximum (flat) parameter count (1000) exceeded: skipping diagnostic tests for n_eff and Rhat.
To run all diagnostics call pystan.check_hmc_diagnostics(fit)
a:       Mean = 76.61     N_eff= 310.3      Rhat= 0.9991
e:       Mean = 52.39    N_eff= 360.1      Rhat= 1.0
Sigma:   Mean = 2540.71   N_eff= 595.8      Rhat= 0.9993

Debbie · March 31, 2020, 10:01am

I found the error!

Maanit_Mehra · May 11, 2020, 8:31pm

@Debbie: Could you share what worked for you? I seem to be working through a similar error.

Topic		Replies	Views
Stan model estimating parameters well, but data noise (sigma) poorly Modeling	1	458	April 16, 2020
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! Modeling ode	1	1273	April 14, 2020
Stan model provides different estimate from actual process Modeling fitting-issues	2	508	April 5, 2022
Toy ODE model problem -- posterior predictives not consistent with real values Modeling fitting-issues , specification	14	1047	June 4, 2019
Help fitting ODEs in STAN Modeling fitting-issues	9	988	October 22, 2019

Algorithm producing consistent, but inaccurate parameter estimation (integrate_ode)

Related topics