Pystan sampling from the posterior predictive

I love how clean and expressive the pystan api is.

sample = sm.sampling(data=data).to_dataframe()

I was just wondering if there’s a way to have the sampler sample from the likelihood conditional on each simulated parameter as well, in order to get posterior predictive observations added to this dataframe?

1 Like

You need to calculate them in generated quantities -block.

So for a simple model

data {
    int<lower=1> N;
    vector[N] y;
parameters {
    real mu;
model {
    y ~ normal(mu, 1);
}
generated quantities {
    vector[N] yhat;
    for (n in 1:N) {
        yhat[n] = normal_rng(mu, 1);
    }
}
1 Like

Perfect, thanks! Are vector expressions valid here too?

Some _rng functions are vectorised, so you don’t need an explicit loop.

1 Like

I have a question reading the line, not sure it is just an example code. I understand the posterior predictive distribution has the same mean as the posterior distribution, but it shouldn’t have the same variance.

Thanks!

It’s hardcoded to the model (so yeah, this is just an example model).

To fit it against the data, you need to define a parameter (sigma for example) and the use that (with the same structure as mu).

Here’s a slightly more complex model based on the original answer that does univariate OLS with prediction. In addition to the N observed datapoints you pass in for the regression fit, you also pass in K additional points that you’d like predictions for (they can be the same but don’t have to be). Then you can get point estimates via pystan.optimizing , or simulated observations via pystan.sampler that can be used to form Bayesian prediction intervals.

data {
    int<lower=0> N; // number of input
    vector[N] x;
    vector[N] y;
    int<lower=0> K; // number of predicted values
    vector[K] x_pred; // values we want predictions for
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
}
model {
    y ~ normal(alpha +beta * x, sigma);
}
generated quantities {
    vector[K] yhat;
    for (n in 1:K) {
        yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma);
    }
}

Hi, Sorry to jump on this so late.

Should yhat now also somehow contain many values for each to-be predicted value? I am only starting out with stan, but yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma); looks like there is a single value predicted.

Yes. Only single value is predicted here (per draw). You could add another dimension and predict more values (per draw)