 # Attempt to generate prior predictive samples from SIR model

I am trying to pick up PyStan for my research project involving ODE. I have found this very instructive paper:
https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html#2_using_simulated_data_to_understand_our_model
but they used Stan via R, so I am trying to reproduce part of their work using PyStan. Specifically my current goal is just to produce some prior predictive samples like the 3rd figure of Section 2 in the paper. I attach below the parameters I fed to the model and the ‘stan’ code that I’ve slightly modified to achieve my goal.

``````N = 763
t = np.array(range(1,31))
t0 = 0
i0 = 1
r0 = 0
y0 = [N-i0, i0, r0]
n_days = 30

simu_3 = dict(n_days = n_days, y0=y0, t0=t0, ts=t, N=N)
``````
``````sir_stan = '''
functions {
real[] sir(real t, real[] y, real[] theta,
real[] x_r, int[] x_i) {

real S = y;
real I = y;
real R = y;
real N = x_i;

real beta = theta;
real gamma = theta;

real dS_dt = -beta * I * S / N;
real dR_dt =  gamma * I;
real dI_dt = -(dS_dt + dR_dt);

return {dS_dt, dI_dt, dR_dt};
}
}

data {
int<lower=1> n_days;
real<lower=0> y0;
real t0;
real ts[n_days];
int N;
}

transformed data {
real x_r;
int x_i = { N };
}

parameters {
real<lower=0> gamma;
real<lower=0> beta;
real<lower=0> phi_inv;
}
transformed parameters{
real y[n_days, 3];
real phi = 1. / phi_inv;
{
real theta;
theta = beta;
theta = gamma;

// Solve ODE at current parameter values
y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
}
// Adding a small +ve value to ensure strict positivity of z
for (k in 1:3){
for (n in 1:n_days){
y[n,k] = y[n,k] + 0.00001;
}
}
}
model {
//priors
beta ~ normal(2, 1);
gamma ~ normal(0.4, 0.5);
phi_inv ~ exponential(5);

}
generated quantities {
real R0 = beta / gamma;
real recovery_time = 1 / gamma;
real pred_cases[n_days];
pred_cases = neg_binomial_2_rng(col(to_matrix(y), 2), phi);
// col(to_matrix(y),2) means second column of the matrix y, that is I(t)
}

'''

``````

Notice that I’ve added the part in the `transformed parameters` block because I had some errors indicating that my input to the `negative_binomial_2_rng` has negative location parameter, so I forced it to be positive in order to debug the problem. However when the sampling is done, and I proceed to check one of my sample of `y`, I was expecting time series of S , I , R for 30 days. But the result was that, the values somewhat indicate that they ‘remained the same’ throughout the 30 days and so I thought it was no wonder that it complained about negative values since ‘I’ never increased but stayed the same throughout the 30 days for all the samples. Is there anything that I should further modify but that I missed to achieve my goal of getting prior predictive samples from this model? Thanks.

1 Like

I don’t why you’re seeing a flat line. I copied your code and ran

``````fit = model.sampling(data=simu_3)
y = fit.extract()['y']
for i in range(50):
plt.plot(pd.DataFrame(y[i]),c="k",alpha=0.1)
plt.plot(y);
``````

The result:I don’t why you’re seeing a flat line. I copied your code and ran

``````fit = model.sampling(data=simu_3)
y = fit.extract()['y']
for i in range(50):
pp.plot(pd.DataFrame(y[i]),c="k",alpha=0.1)
pp.plot(y);
``````

The result looks fine 4 Likes

Hi! Thanks for the prompt reply. I see where the issue arises now. The following is the line I used for sampling, perhaps ‘Fixed_param’ caused the problem?

``````fit = sm.sampling(data=simu_3, iter=1000, warmup=0, chains=1,
refresh=1000, seed= 12345, algorithm='Fixed_param')
``````

I’m fairly new to stan, so I didn’t know exactly what to do. I’ve arrived to this line after seeing various examples and tutorials and I have observed that ‘Fixed_param’ has been used to generate samples from the `generated quantities` block without MCMC. What may have caused the issue?

Edit: I’ve did some trial and error with some other models, seems like I can only use ‘Fixed_param’ for models without parameters in the `parameters` block.

That’s right–but without MCMC the `model` block doesn’t even run and `parameters` are just constant (usually, you would call `sampling()` with the `init` argument.)

1 Like

That’s very helpful. So does that mean for this particular example, when `algorithm='Fixed_param'` is used without stating the `init` argument, some random initial values for the parameters will be chosen and is fixed throughout all the 1000 iterations since the `model` block couldn’t run, which may have caused the flat lines?

I have also tried putting the parameters as well as the items in the `transformed quantities` block into the `generated quantities` block to try and do without the `model` block and generate the parameter values using `_rng` for each distribution but I’ve gotten some syntax error during compilation saying something like “variable “real” does not exist”. What is it usually that is put within the `model` block vs the `generated quantities` block?

Yes, exactly.

That’s an annoying bug. The Stan compiler requires all variable declarations to be at the start of the block, with nothing in between them. So this is fine

``````generated quantities {
real beta;
real gamma;
beta = normal_rng(1,1);
gamma = normal_rng(1,1);
}
``````

but this is a syntax error

``````generated quantities {
real beta;
beta = normal_rng(1,1);
real gamma; // syntax error on this line
gamma = normal_rng(1,1);
}
``````

This has been fixed in the latest CmdStan but I don’t know if PyStan is there yet.

The usual rule is to put everything you can in generated quantities because it’s the most efficient. Things that are “upstream” from data have to be in model block. Like if you have

``````data {
real x;
}
parameters {
real m;
}
model {
x ~ normal(m, 1);
}
``````

You can’t do `x = normal_rng(m, 1)` because that’s not fitting the observed data, it just a new prediction from the prior.

2 Likes

This is very well explained. Never expect that would raise a syntax error! Thanks a bunch.