Prediction for a new observation


#1

data {
int N;// number of observations
int n[N];//trials
int y[N];//successes
int M;
matrix[N,M] x;//predictor matrix
}
parameters {
vector[M] Beta;
real sigma;//between-observation variance
real gamma[N];//random effect

}
transformed parameters {
vector[N] mu;// probability
for (i in 1:N) {
mu[i] =inv_logit(dot_product(x[i],Beta)+gamma[i]);
}
}
model {
Beta~normal(0,1000);//priors are vague
sigma ~ inv_gamma(0.001,0.001);
gamma~normal(0,sigma);
y~binomial(n,mu);
}

Hello everyone!
I would like to calculate mu for a new observation.
I just tried to add predictors x of a new data and calculate it in generated quantities part, however, it takes long time if every time I will run the program.
Is there any way to calculate it using posterior draws?!
Thanks in advance


#2

We’re working on allowing new generated quantities blocks to be run with previous draws. It’s all coded up at the C++ level and is wending its way through the interfaces.

For now, you either have to refit or do the predictions in another language with the draws, like R or Python.


#3

Ok thanks!

I am wondering if there is any example how to make predictions in R using posterior draws?!
I am new in statistic and R, so still some questions make me confused


#4

If you have this in your data and generated quantities block for a linear regression:

data {
  ...
  vector[N_new] x_new;
...
generated quantities {
  vector[N_new] y_new;
  for (n in 1:N_new)
    y_new[n] = normal_rng(x_new[n] * beta, sigma);
}

you’d just write it out in R like this:

mu_draws <- extract(fit)$mu
sigma_draws <- extract(fit)$sigma
M <- length(mu_draws)
y <- matrix(NA, M, N_new)
for (n in 1:N)
  for (m in 1:M)
    y_new[m, n] <- rnorm(1, x_new[n] * beta[m], sigma[m]);

And now th e value of y_new in R is the same (statistically, not exactly) as the y_new you’d get from extract(fit)$y_new if you’d fit it in Stan.


#5

Thanks!
However, my problem is in my case y and n are given, and the parameter of interest is mu.
I was thinking just to use the mean of estimated Beta values and to use it directly in logistic regression formula or binomial distribution formula…
but I guess it is not acceptable to calculate it in this way

May be I am not getting smth


#6

The problem with doing that is you lose the uncertainty on the downstream predictions. Bayesian inference requires you to propagate that uncertainty in parameter estimation through predictions.

It works the same way no matter what it is you’re doing. You just pull the calculations outside and apply them to each draw. It’s clunky, but it’s what you need to do to get the correct downstream uncertainty.


#7

let us say, the model named as model_fit

is it ok if, I will extract first:
extr<-extract(model_fit)
then create a test set:
x_test<-c(1,2,1,3,1,60)
then:
pred<-apply(extr$Beta,1,function(beta) x_test * beta)
and just because of inv_logit in my code:
final<-exp(pred)/(1+exp(pred))
will it give me the correct estimation?!

Thanks in advance


#8

Yes, that should work. I’d probably write an inverse logit function with only one exponentiation:

ilogit <- function(u) 1 / (1 + exp(-u))

and just throw that into the apply with everything else. I’m often surprised at what does and doesn’t work in R, so no guarantees!

You don’t have to take my word for it. Run some simulation studies and you can see this for yourself. Simulate data accoding to the model (simulate parameters from the priors, then data from the parameters) and check whether the predictions are calibrated (i.e., binomial(N, 0.5) of the true values fall within the posterior 50% interval).


#9

Ok, thank you for quick answers!