Pystan sampling from the posterior predictive

Tuee · September 30, 2019, 8:53pm

I love how clean and expressive the pystan api is.

sample = sm.sampling(data=data).to_dataframe()

I was just wondering if there’s a way to have the sampler sample from the likelihood conditional on each simulated parameter as well, in order to get posterior predictive observations added to this dataframe?

ahartikainen · September 30, 2019, 10:40pm

You need to calculate them in generated quantities -block.

So for a simple model

data {
    int<lower=1> N;
    vector[N] y;
parameters {
    real mu;
model {
    y ~ normal(mu, 1);
}
generated quantities {
    vector[N] yhat;
    for (n in 1:N) {
        yhat[n] = normal_rng(mu, 1);
    }
}

Tuee · September 30, 2019, 11:44pm

Perfect, thanks! Are vector expressions valid here too?

mcol · October 1, 2019, 3:04pm

Some _rng functions are vectorised, so you don’t need an explicit loop.

ztttt · October 2, 2019, 9:27pm

I have a question reading the line, not sure it is just an example code. I understand the posterior predictive distribution has the same mean as the posterior distribution, but it shouldn’t have the same variance.

Thanks!

ahartikainen · October 2, 2019, 9:34pm

It’s hardcoded to the model (so yeah, this is just an example model).

To fit it against the data, you need to define a parameter (sigma for example) and the use that (with the same structure as mu).

Tuee · October 2, 2019, 10:13pm

Here’s a slightly more complex model based on the original answer that does univariate OLS with prediction. In addition to the N observed datapoints you pass in for the regression fit, you also pass in K additional points that you’d like predictions for (they can be the same but don’t have to be). Then you can get point estimates via pystan.optimizing , or simulated observations via pystan.sampler that can be used to form Bayesian prediction intervals.

data {
    int<lower=0> N; // number of input
    vector[N] x;
    vector[N] y;
    int<lower=0> K; // number of predicted values
    vector[K] x_pred; // values we want predictions for
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
}
model {
    y ~ normal(alpha +beta * x, sigma);
}
generated quantities {
    vector[K] yhat;
    for (n in 1:K) {
        yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma);
    }
}

jsevo · October 30, 2020, 4:43pm

Hi, Sorry to jump on this so late.

Should yhat now also somehow contain many values for each to-be predicted value? I am only starting out with stan, but yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma); looks like there is a single value predicted.

ahartikainen · October 30, 2020, 7:00pm

Yes. Only single value is predicted here (per draw). You could add another dimension and predict more values (per draw)

Topic		Replies	Views
Generated Quantities for prediction data General	6	1095	July 28, 2020
Attempt to generate prior predictive samples from SIR model PyStan epidemiology , prior-predictive	6	1509	May 30, 2020
Posterior Predictive Checks After Sampling Modeling	3	820	October 23, 2022
How to reuse PyStan model to predict new dataset PyStan	2	2748	February 18, 2018
Posterior predictive plot in when using PyStan General pystan , python , arviz	4	1898	June 4, 2021

Pystan sampling from the posterior predictive

Related topics