Attempt to generate prior predictive samples from SIR model

ziqing-ang · May 29, 2020, 4:23pm

I am trying to pick up PyStan for my research project involving ODE. I have found this very instructive paper:
https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html#2_using_simulated_data_to_understand_our_model
but they used Stan via R, so I am trying to reproduce part of their work using PyStan. Specifically my current goal is just to produce some prior predictive samples like the 3rd figure of Section 2 in the paper. I attach below the parameters I fed to the model and the ‘stan’ code that I’ve slightly modified to achieve my goal.

N = 763
t = np.array(range(1,31))
t0 = 0
i0 = 1
r0 = 0
y0 = [N-i0, i0, r0]
n_days = 30

simu_3 = dict(n_days = n_days, y0=y0, t0=t0, ts=t, N=N)

sir_stan = '''
functions {
  real[] sir(real t, real[] y, real[] theta, 
             real[] x_r, int[] x_i) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real N = x_i[1];
      
      real beta = theta[1];
      real gamma = theta[2];
      
      real dS_dt = -beta * I * S / N;
      real dR_dt =  gamma * I;
      real dI_dt = -(dS_dt + dR_dt);
      
      return {dS_dt, dI_dt, dR_dt};
  }
}

data {
  int<lower=1> n_days;
  real<lower=0> y0[3];
  real t0;
  real ts[n_days];
  int N;
}

transformed data {
  real x_r[0];
  int x_i[1] = { N };
}

parameters {
  real<lower=0> gamma;
  real<lower=0> beta;
  real<lower=0> phi_inv;
}
transformed parameters{
  real y[n_days, 3];
  real phi = 1. / phi_inv;
  {
    real theta[2];
    theta[1] = beta;
    theta[2] = gamma;

    // Solve ODE at current parameter values
    y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
  }
  // Adding a small +ve value to ensure strict positivity of z
  for (k in 1:3){
      for (n in 1:n_days){
          y[n,k] = y[n,k] + 0.00001;
          }
    }
}
model {
  //priors
  beta ~ normal(2, 1);
  gamma ~ normal(0.4, 0.5);
  phi_inv ~ exponential(5);
  
}
generated quantities {
  real R0 = beta / gamma;
  real recovery_time = 1 / gamma;
  real pred_cases[n_days];
  pred_cases = neg_binomial_2_rng(col(to_matrix(y), 2), phi);
  // col(to_matrix(y),2) means second column of the matrix y, that is I(t)
}

'''

Notice that I’ve added the part in the transformed parameters block because I had some errors indicating that my input to the negative_binomial_2_rng has negative location parameter, so I forced it to be positive in order to debug the problem. However when the sampling is done, and I proceed to check one of my sample of y, I was expecting time series of S , I , R for 30 days. But the result was that, the values somewhat indicate that they ‘remained the same’ throughout the 30 days and so I thought it was no wonder that it complained about negative values since ‘I’ never increased but stayed the same throughout the 30 days for all the samples.

Is there anything that I should further modify but that I missed to achieve my goal of getting prior predictive samples from this model? Thanks.

nhuurre · May 29, 2020, 5:02pm

I don’t why you’re seeing a flat line. I copied your code and ran

fit = model.sampling(data=simu_3)
y = fit.extract()['y']
for i in range(50):
    plt.plot(pd.DataFrame(y[i]),c="k",alpha=0.1)
plt.plot(y[50]);

The result:I don’t why you’re seeing a flat line. I copied your code and ran

fit = model.sampling(data=simu_3)
y = fit.extract()['y']
for i in range(50):
    pp.plot(pd.DataFrame(y[i]),c="k",alpha=0.1)
pp.plot(y[50]);

The result looks fine
SIR

ziqing-ang · May 30, 2020, 1:49am

Hi! Thanks for the prompt reply. I see where the issue arises now. The following is the line I used for sampling, perhaps ‘Fixed_param’ caused the problem?

fit = sm.sampling(data=simu_3, iter=1000, warmup=0, chains=1, 
                  refresh=1000, seed= 12345, algorithm='Fixed_param')

I’m fairly new to stan, so I didn’t know exactly what to do. I’ve arrived to this line after seeing various examples and tutorials and I have observed that ‘Fixed_param’ has been used to generate samples from the generated quantities block without MCMC. What may have caused the issue?

Edit: I’ve did some trial and error with some other models, seems like I can only use ‘Fixed_param’ for models without parameters in the parameters block.

nhuurre · May 30, 2020, 6:00am

That’s right–but without MCMC the model block doesn’t even run and parameters are just constant (usually, you would call sampling() with the init argument.)

ziqing-ang · May 30, 2020, 6:30am

That’s very helpful. So does that mean for this particular example, when algorithm='Fixed_param' is used without stating the init argument, some random initial values for the parameters will be chosen and is fixed throughout all the 1000 iterations since the model block couldn’t run, which may have caused the flat lines?

I have also tried putting the parameters as well as the items in the transformed quantities block into the generated quantities block to try and do without the model block and generate the parameter values using _rng for each distribution but I’ve gotten some syntax error during compilation saying something like “variable “real” does not exist”. What is it usually that is put within the model block vs the generated quantities block?

nhuurre · May 30, 2020, 6:47am

Yes, exactly.

That’s an annoying bug. The Stan compiler requires all variable declarations to be at the start of the block, with nothing in between them. So this is fine

generated quantities {
  real beta;
  real gamma;
  beta = normal_rng(1,1);
  gamma = normal_rng(1,1);
}

but this is a syntax error

generated quantities {
  real beta;
  beta = normal_rng(1,1);
  real gamma; // syntax error on this line
  gamma = normal_rng(1,1);
}

This has been fixed in the latest CmdStan but I don’t know if PyStan is there yet.

The usual rule is to put everything you can in generated quantities because it’s the most efficient. Things that are “upstream” from data have to be in model block. Like if you have

data {
  real x;
}
parameters {
  real m;
}
model {
  x ~ normal(m, 1);
}

You can’t do x = normal_rng(m, 1) because that’s not fitting the observed data, it just a new prediction from the prior.

ziqing-ang · May 30, 2020, 8:12am

This is very well explained. Never expect that would raise a syntax error! Thanks a bunch.

Topic		Replies	Views
Pystan sampling from the posterior predictive PyStan	8	3074	October 30, 2020
Generated Quantities for prediction data General	6	1095	July 28, 2020
Issue with Prior Distributions in ODE Model Modeling	2	511	February 13, 2020
Generate predictions from pyStan after model fit in Stan 3 Modeling	4	422	July 25, 2023
How to reuse PyStan model to predict new dataset PyStan	2	2748	February 18, 2018

Attempt to generate prior predictive samples from SIR model

Related topics