I love how clean and expressive the pystan api is.
sample = sm.sampling(data=data).to_dataframe()
I was just wondering if there’s a way to have the sampler sample from the likelihood conditional on each simulated parameter as well, in order to get posterior predictive observations added to this dataframe?
I have a question reading the line, not sure it is just an example code. I understand the posterior predictive distribution has the same mean as the posterior distribution, but it shouldn’t have the same variance.
Here’s a slightly more complex model based on the original answer that does univariate OLS with prediction. In addition to the N observed datapoints you pass in for the regression fit, you also pass in K additional points that you’d like predictions for (they can be the same but don’t have to be). Then you can get point estimates via pystan.optimizing , or simulated observations via pystan.sampler that can be used to form Bayesian prediction intervals.
data {
int<lower=0> N; // number of input
vector[N] x;
vector[N] y;
int<lower=0> K; // number of predicted values
vector[K] x_pred; // values we want predictions for
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha +beta * x, sigma);
}
generated quantities {
vector[K] yhat;
for (n in 1:K) {
yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma);
}
}
Should yhat now also somehow contain many values for each to-be predicted value? I am only starting out with stan, but yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma); looks like there is a single value predicted.