Explaining how Rstan estimates parameter using ODE and data.!

Boyh · March 24, 2021, 11:50pm

Please can anyone be kind enough to explain the process that Rstan uses to estimate parameters given a differential equation and collected data. A link to a page or journal article will be appreciated. Thanks

wds15 · March 25, 2021, 8:09am

Like here:

Bayesian Model of Planetary Motion: exploring ideas for a modeling workflow when dealing with ordinary differential equations and multimodality | planetary_motion.utf8
Upgrading to the new ODE interface
Bayesian workflow for disease transmission modeling in Stan

?

srossell · March 25, 2021, 8:25am

Hi,

I wrote a blogpost with a very simple example.
https://towardsdatascience.com/bayesian-inference-and-differential-equations-cb646b50f148

I hope it will be useful for you

Boyh · March 27, 2021, 12:57pm

Thank you @srossell , thank you @wds15, I have been working with Bayesian workflow for disease transmission modeling in Stan to estimate the SIR model parameter values on Covid of a country using the same priors as used in the work , but I keep getting the messages belows

"1: The largest R-hat is 1.73, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
"
I ran the chain up to 7000 iterations and visited the suggested links, yet the chains are still not mixing. I am a maths student however and I am trying to master Bayern and Rstan but is really not working. Thanks

srossell · March 28, 2021, 12:35pm

Hi Boyh,

Your question is too open. If you are using Grinsztajn et al’s Stan program, it may be best to approach one of the authors. If you’ve edited their code, then make incremental changes to Grinsztajn et al’s code. Hopefully, in that way you’ll identify you issue.

Good luck

Funko_Unko · March 28, 2021, 9:39pm

Hmmmm, that paper / model was written at the beginning of the pandemic, wasn’t it?

Maybe it works better with early data, and has to be extended to work up to current events?

Boyh · March 29, 2021, 8:42am

yes @Funko_Unko It is and I also using it for the data at the beginning of the pandemic., but I keep getting the followings below

1: The largest R-hat is 1.73, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

I used the prior model used, adjusted it and even increased the iteration number to 7000 (just for 216 observable data), but the chains ain’t still mixing well. R hat used for determining the accuracy of the whole process is still greater than 1.
I do not know what else to do. Thanks

maxbiostat · March 29, 2021, 1:21pm

Let’s back up a bit. Can you post your model and perhaps some data so we can have a look? The resources @wds15 posted are also worth checking out.

Boyh · March 29, 2021, 3:45pm

Attached below is the data and codes. Thank you
covid_data.csv (3.3 KB)

//saved as seir.stan
functions {
  real[] sir(real t, real[] y, real[] theta, 
             real[] x_r, int[] x_i) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      int N = x_i[1];
      
      real beta = theta[1];
      real gamma = theta[2];
      
      real dS_dt = -beta * I * S / N;
      real dI_dt = beta * I * S / N - gamma * I;
      real dR_dt =  gamma * I;
      
      return {dS_dt, dI_dt, dR_dt};
  }
}
data {
  int<lower=1> n_days;
  real t0;
  real y0[3];
  real ts[n_days];
  int N;
  int cases[n_days];
}
transformed data {
  real x_r[0];
  int x_i[1] = { N };
}
parameters {
  real<lower=0> gamma;
  real<lower=0> beta;
  real<lower=0> phi_inv;
  real<lower=0, upper=1> p_reported; // proportion of infected (symptomatic) people reported
}
transformed parameters{
  real y[n_days, 3];
  real incidence[n_days - 1];
  real phi = 1. / phi_inv;
  // initial compartement values
  real theta[2] = {beta, gamma};
  y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
  for (i in 1:n_days-1){
    incidence[i] =  (y[i, 1] - y[i+1, 1]) * p_reported;
  }
}
model {
  //priors
  beta ~ normal(2, 1);
  gamma ~ normal(0.9, 1);
  phi_inv ~ exponential(5);
  p_reported ~ beta(1, 2);
  //sampling distribution
  //col(matrix x, int n) - The n-th column of matrix x. Here the number of infected people 
  cases[1:(n_days-1)] ~ neg_binomial_2(incidence, phi);
}
generated quantities {
  real R0 = beta / gamma;
  real recovery_time = 1 / gamma;
  real pred_cases[n_days-1];
  pred_cases = neg_binomial_2_rng(incidence, phi);
}

setwd("~/R")

nga <- read.csv("covid_data.csv", header=TRUE, sep=",")


# Nigeria population
N <- 211400708;

# time series of cases
cases <- nga$Number_of_Reported_Cases  # Number of cases

# times
n_days <- length(cases) 
t <- seq(0, n_days, by = 1)
t0 = 0 
t <- t[-1]

#initial conditions
i0 <- 5
s0 <- N - i0
r0 <- 0
y0 = c(S = s0, I = i0, R = r0)

# data for Stan
data_sir <- list(n_days = n_days, y0 = y0, t0 = t0, ts = t, N = N, cases = cases)

# number of MCMC steps
niter <- 2000

model <- stan_model("seir.stan")


#pass the data to stan and run the model
fit_seir <- sampling(model,
                           data = data_sir,
                           iter = niter,
                           chains = 4, 
                           seed = 0,
                           control = list(max_treedepth = 15, adapt_delta=0.99))

traceplot(fit_seir, pars = c('beta', 'gamma', "R0", "recovery_time"))
pars=c('beta', 'gamma', "R0", "recovery_time")
print(fit_seir, pars = pars)
stan_dens(fit_seir, pars = pars, separate_chains = TRUE)

maxbiostat · March 29, 2021, 6:52pm

Thanks. I’m a bit swapped for the coming days, so will tag @charlesm93 just in case he has some free time.

Boyh · March 31, 2021, 9:08pm

@charlesm93 , I know you must be really busy, please can you spend few mins and look into my codes, it really matters to me. I have done everything I can. Thank you.

charlesm93 · April 1, 2021, 12:50am

Hi,
Can you print the trace plots, including the warmup phase, and the summary of the fit? Please include lp__ in the parameters, that is

pars = c('beta', 'gamma', 'R0', 'recovery_time', 'lp__')

Once I see these, I’ll make further comments.

maxbiostat · April 1, 2021, 12:54am

I can.

I used cmdstanr running cmdstan-2.26.1 though.

EDIT: And I further excluded a point (cases[107]) which was suspiciously 0 in the middle of the epidemic.

charlesm93 · April 1, 2021, 12:57am

Thanks, can you also include the warmup phase?

maxbiostat · April 1, 2021, 1:03am

Sorry, you did ask for that and I forgot. Here’s what I get when using inc_warmup=TRUE.

Looks weird. What gives?

charlesm93 · April 1, 2021, 1:06am

I think you didn’t use the save_warmup option if you fitted the model with cmdstanr. So it’s returning the same iterations.

maxbiostat · April 1, 2021, 1:22am

Used a few more iterations this time.

Seems like a case of the unidentifiability-itis.
EDIT: Wrong diagnosis. Could be a case of multimodality-itis, but we’re waiting on the lab results [OK, I’m done with the disease analogy now].

Funko_Unko · April 1, 2021, 8:16am

For unidentifiability we would like the posterior predictive plots to look similar though, don’t we? Seems unlikely that they do.

maxbiostat · April 1, 2021, 11:06am

They lead to the same R_0, which ultimately controls the dynamics. Here are traceplots of selected points of the posterior predictive:

Funko_Unko · April 1, 2021, 11:19am

I can see exactly nothing on the latter two plots.

It does look though like that one chain consistently predicts much fewer cases for day 1 and day 20, where it should be in the order of zero.

Topic		Replies	Views
Estimating parameters of a SEIR model Modeling fitting-issues , specification , epidemiology	28	3069	May 23, 2020
SIRD model with terrible chain mixing for starters Modeling	20	1527	January 22, 2022
Getting two errors "Rejecting initial value" and "1000 transitions using 10 leapfrog steps per transition would take" while running stan Modeling fitting-issues , specification	10	1051	May 13, 2020
Model fitting and sampling issue: Only 1 chain sampling properly Modeling rstan , fitting-issues , performance	14	2480	September 20, 2022
Toy ODE model problem -- posterior predictives not consistent with real values Modeling fitting-issues , specification	14	1063	June 4, 2019

Explaining how Rstan estimates parameter using ODE and data.!

Related topics