Prediction for a new observation

Assem · March 11, 2018, 5:05pm

data {
int N;// number of observations
int n[N];//trials
int y[N];//successes
int M;
matrix[N,M] x;//predictor matrix
}
parameters {
vector[M] Beta;
real sigma;//between-observation variance
real gamma[N];//random effect

}
transformed parameters {
vector[N] mu;// probability
for (i in 1:N) {
mu[i] =inv_logit(dot_product(x[i],Beta)+gamma[i]);
}
}
model {
Beta~normal(0,1000);//priors are vague
sigma ~ inv_gamma(0.001,0.001);
gamma~normal(0,sigma);
y~binomial(n,mu);
}

Hello everyone!
I would like to calculate mu for a new observation.
I just tried to add predictors x of a new data and calculate it in generated quantities part, however, it takes long time if every time I will run the program.
Is there any way to calculate it using posterior draws?!
Thanks in advance

Bob_Carpenter · March 13, 2018, 9:36pm

We’re working on allowing new generated quantities blocks to be run with previous draws. It’s all coded up at the C++ level and is wending its way through the interfaces.

For now, you either have to refit or do the predictions in another language with the draws, like R or Python.

Assem · March 15, 2018, 5:36am

Ok thanks!

I am wondering if there is any example how to make predictions in R using posterior draws?!
I am new in statistic and R, so still some questions make me confused

Bob_Carpenter · March 15, 2018, 5:49am

If you have this in your data and generated quantities block for a linear regression:

data {
  ...
  vector[N_new] x_new;
...
generated quantities {
  vector[N_new] y_new;
  for (n in 1:N_new)
    y_new[n] = normal_rng(x_new[n] * beta, sigma);
}

you’d just write it out in R like this:

mu_draws <- extract(fit)$mu
sigma_draws <- extract(fit)$sigma
M <- length(mu_draws)
y <- matrix(NA, M, N_new)
for (n in 1:N)
  for (m in 1:M)
    y_new[m, n] <- rnorm(1, x_new[n] * beta[m], sigma[m]);

And now th e value of y_new in R is the same (statistically, not exactly) as the y_new you’d get from extract(fit)$y_new if you’d fit it in Stan.

Assem · March 15, 2018, 10:48am

Thanks!
However, my problem is in my case y and n are given, and the parameter of interest is mu.
I was thinking just to use the mean of estimated Beta values and to use it directly in logistic regression formula or binomial distribution formula…
but I guess it is not acceptable to calculate it in this way

May be I am not getting smth

Bob_Carpenter · March 16, 2018, 4:14am

The problem with doing that is you lose the uncertainty on the downstream predictions. Bayesian inference requires you to propagate that uncertainty in parameter estimation through predictions.

It works the same way no matter what it is you’re doing. You just pull the calculations outside and apply them to each draw. It’s clunky, but it’s what you need to do to get the correct downstream uncertainty.

Assem · March 16, 2018, 4:49am

let us say, the model named as model_fit

is it ok if, I will extract first:
extr<-extract(model_fit)
then create a test set:
x_test<-c(1,2,1,3,1,60)
then:
pred<-apply(extr$Beta,1,function(beta) x_test * beta)
and just because of inv_logit in my code:
final<-exp(pred)/(1+exp(pred))
will it give me the correct estimation?!

Thanks in advance

Bob_Carpenter · March 16, 2018, 4:54am

Yes, that should work. I’d probably write an inverse logit function with only one exponentiation:

ilogit <- function(u) 1 / (1 + exp(-u))

and just throw that into the apply with everything else. I’m often surprised at what does and doesn’t work in R, so no guarantees!

You don’t have to take my word for it. Run some simulation studies and you can see this for yourself. Simulate data accoding to the model (simulate parameters from the priors, then data from the parameters) and check whether the predictions are calibrated (i.e., binomial(N, 0.5) of the true values fall within the posterior 50% interval).

Assem · March 16, 2018, 7:37am

Ok, thank you for quick answers!

Topic		Replies	Views
Best way to do prediction on new data? R rstan stanfit Modeling	16	14705	October 22, 2019
Generated quantities for prediction Modeling specification , posterior-predictive	9	667	June 5, 2024
Predictions by using the generated quantities block in Stan General	7	6105	March 15, 2018
Using the posterior predictive distribution of my model to estimate the probability that a future observation is greater than 15 Modeling	8	2428	May 19, 2018
Prediction for a new observation in a new group Modeling	2	536	February 4, 2019

Prediction for a new observation

Related topics