# Attempt to generate prior predictive samples from SIR model

I am trying to pick up PyStan for my research project involving ODE. I have found this very instructive paper:
https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html#2_using_simulated_data_to_understand_our_model
but they used Stan via R, so I am trying to reproduce part of their work using PyStan. Specifically my current goal is just to produce some prior predictive samples like the 3rd figure of Section 2 in the paper. I attach below the parameters I fed to the model and the âstanâ code that Iâve slightly modified to achieve my goal.

``````N = 763
t = np.array(range(1,31))
t0 = 0
i0 = 1
r0 = 0
y0 = [N-i0, i0, r0]
n_days = 30

simu_3 = dict(n_days = n_days, y0=y0, t0=t0, ts=t, N=N)
``````
``````sir_stan = '''
functions {
real[] sir(real t, real[] y, real[] theta,
real[] x_r, int[] x_i) {

real S = y[1];
real I = y[2];
real R = y[3];
real N = x_i[1];

real beta = theta[1];
real gamma = theta[2];

real dS_dt = -beta * I * S / N;
real dR_dt =  gamma * I;
real dI_dt = -(dS_dt + dR_dt);

return {dS_dt, dI_dt, dR_dt};
}
}

data {
int<lower=1> n_days;
real<lower=0> y0[3];
real t0;
real ts[n_days];
int N;
}

transformed data {
real x_r[0];
int x_i[1] = { N };
}

parameters {
real<lower=0> gamma;
real<lower=0> beta;
real<lower=0> phi_inv;
}
transformed parameters{
real y[n_days, 3];
real phi = 1. / phi_inv;
{
real theta[2];
theta[1] = beta;
theta[2] = gamma;

// Solve ODE at current parameter values
y = integrate_ode_rk45(sir, y0, t0, ts, theta, x_r, x_i);
}
// Adding a small +ve value to ensure strict positivity of z
for (k in 1:3){
for (n in 1:n_days){
y[n,k] = y[n,k] + 0.00001;
}
}
}
model {
//priors
beta ~ normal(2, 1);
gamma ~ normal(0.4, 0.5);
phi_inv ~ exponential(5);

}
generated quantities {
real R0 = beta / gamma;
real recovery_time = 1 / gamma;
real pred_cases[n_days];
pred_cases = neg_binomial_2_rng(col(to_matrix(y), 2), phi);
// col(to_matrix(y),2) means second column of the matrix y, that is I(t)
}

'''

``````

Notice that Iâve added the part in the `transformed parameters` block because I had some errors indicating that my input to the `negative_binomial_2_rng` has negative location parameter, so I forced it to be positive in order to debug the problem. However when the sampling is done, and I proceed to check one of my sample of `y`, I was expecting time series of S , I , R for 30 days. But the result was that, the values somewhat indicate that they âremained the sameâ throughout the 30 days and so I thought it was no wonder that it complained about negative values since âIâ never increased but stayed the same throughout the 30 days for all the samples.

Is there anything that I should further modify but that I missed to achieve my goal of getting prior predictive samples from this model? Thanks.

1 Like

I donât why youâre seeing a flat line. I copied your code and ran

``````fit = model.sampling(data=simu_3)
y = fit.extract()['y']
for i in range(50):
plt.plot(pd.DataFrame(y[i]),c="k",alpha=0.1)
plt.plot(y[50]);
``````

The result:I donât why youâre seeing a flat line. I copied your code and ran

``````fit = model.sampling(data=simu_3)
y = fit.extract()['y']
for i in range(50):
pp.plot(pd.DataFrame(y[i]),c="k",alpha=0.1)
pp.plot(y[50]);
``````

The result looks fine

4 Likes

Hi! Thanks for the prompt reply. I see where the issue arises now. The following is the line I used for sampling, perhaps âFixed_paramâ caused the problem?

``````fit = sm.sampling(data=simu_3, iter=1000, warmup=0, chains=1,
refresh=1000, seed= 12345, algorithm='Fixed_param')
``````

Iâm fairly new to stan, so I didnât know exactly what to do. Iâve arrived to this line after seeing various examples and tutorials and I have observed that âFixed_paramâ has been used to generate samples from the `generated quantities` block without MCMC. What may have caused the issue?

Edit: Iâve did some trial and error with some other models, seems like I can only use âFixed_paramâ for models without parameters in the `parameters` block.

Thatâs rightâbut without MCMC the `model` block doesnât even run and `parameters` are just constant (usually, you would call `sampling()` with the `init` argument.)

1 Like

Thatâs very helpful. So does that mean for this particular example, when `algorithm='Fixed_param'` is used without stating the `init` argument, some random initial values for the parameters will be chosen and is fixed throughout all the 1000 iterations since the `model` block couldnât run, which may have caused the flat lines?

I have also tried putting the parameters as well as the items in the `transformed quantities` block into the `generated quantities` block to try and do without the `model` block and generate the parameter values using `_rng` for each distribution but Iâve gotten some syntax error during compilation saying something like âvariable ârealâ does not existâ. What is it usually that is put within the `model` block vs the `generated quantities` block?

Yes, exactly.

Thatâs an annoying bug. The Stan compiler requires all variable declarations to be at the start of the block, with nothing in between them. So this is fine

``````generated quantities {
real beta;
real gamma;
beta = normal_rng(1,1);
gamma = normal_rng(1,1);
}
``````

but this is a syntax error

``````generated quantities {
real beta;
beta = normal_rng(1,1);
real gamma; // syntax error on this line
gamma = normal_rng(1,1);
}
``````

This has been fixed in the latest CmdStan but I donât know if PyStan is there yet.

The usual rule is to put everything you can in generated quantities because itâs the most efficient. Things that are âupstreamâ from data have to be in model block. Like if you have

``````data {
real x;
}
parameters {
real m;
}
model {
x ~ normal(m, 1);
}
``````

You canât do `x = normal_rng(m, 1)` because thatâs not fitting the observed data, it just a new prediction from the prior.

2 Likes

This is very well explained. Never expect that would raise a syntax error! Thanks a bunch.