Prediction using point estimates in rstan

Michael_F · October 13, 2017, 3:35am

Hi,

I’m newish to stan and I’m trying to do prediction on a much larger data set by extracting the coefficients out of the stan model and translating to sql. It’s a bernoulli_logit hierarchical model with two levels where the second level has 19 regions. I’m also including a generated_quantities block for prediction to compare to my manually calculated predictions. The coefficients I extract by taking the mean match the printed coefficients, but when I do the manual calculation I’m getting different results: 1/(1+exp(-(B*X)). Is there an obvious step I’m missing? Specifically, the manually calculated predictions seem to be more extreme (closer to 0 or 1).

Thanks for any help!

Michael

sakrejda · October 13, 2017, 10:25am

Your description doesn’t quite make sense, what are you comparing these transformed coefficients to?

Michael_F · October 13, 2017, 1:47pm

Sorry, what I meant was the coefficients I found by taking the posterior means matched the coefficients reported with print(model,pars=c(‘beta’)).

sakrejda · October 13, 2017, 2:02pm

Either you are:

summarizing before transforming which is the wrong order (transform, then summarize)
you have a bug.

You don’t provide enough info to tell which one is going on.

Michael_F · October 13, 2017, 2:54pm

I think I might be summarizing before transforming (also I’m using variational bayes for speed), does this look wrong? The beta_coef matches the printed coefficients above for all J regions.

beta <- extract(model, ‘beta’)
beta_mat<-as.matrix(beta$beta, ncol=K*J, nrow=1000, byrow=T)/1000
beta_mat<-matrix(beta_mat,ncol=(K)*J)
beta_coef<- matrix(colSums(beta_mat),ncol=J)

Then:
prediction<-1/(1+exp(-as.matrix(df[,full_vars_w_intercept1,with=F])%*%beta_coef[,]))

Bob_Carpenter · October 14, 2017, 9:54pm

It helps if you supply your model. Otherwise we have to guess, as I’m about to do.

This doesn’t make sense. The model just defines a log density. There are then several inference algorithms that let you extract point estimates, including optimization and MCMC sampling.

If you’re using MCMC sampling to compute a posterior, then plugging in the posterior mean won’t give you the same as the mean of the posterior prediction because there are non-linear functions involved. That is, the posterior mean of the parameters is just that—it’s the average parameter value weighted by posterior density. The posterior mean of the prediction is the prediction averaged over all values of parameters (weighted the same way).

In general, you don’t want to use point estimates to do prediction if you can do full Bayes because they tend not to be well calibrated.

Michael_F · October 14, 2017, 11:30pm

Thanks Bob, that’s the conclusion I was coming to, that prediction using the mean of the parameters is not the same as the mean of the posterior prediction, which makes sense. I’m using the full sampling now to generate the posterior predictions for all individuals and then taking the mean over that for a point estimate for each observation, which I think is the track I want to be on.

Bob_Carpenter · October 16, 2017, 2:51am

Stan automatically computes that posterior mean over the draws in all of the interfaces. So you just write the prediction down and then the posterior mean gets computed automatically.

One other thing to keep in mind is that you want to add any “noise” in the model to the prediction using an RNG if you want to capture the appropriate posterior uncertainty in the posterior predictive distribution. It’ll change posterior intervals even when it doesn’t have much (or any in some cases) effect on the posterior mean.

Michael_F · October 16, 2017, 3:17pm

In cases where it isn’t feasible to run all the observations through stan itself, what is the best practice for generating the full posterior for all observations. For example, I’m using Rstan and can only load a few million rows into memory, but have a universe of several hundred million observations that I’d like to generate the posterior for. Right now I’m coding up all the models into sql but that seems tedious and pretty computationally intensive.

bgoodri · October 16, 2017, 3:19pm

I think you want to be using the bigmemory R package or one of the packages that is related to it
https://cran.r-project.org/web/packages/bigmemory/index.html

Michael_F · October 16, 2017, 3:21pm

Thanks, I’ll check that out.

Bob_Carpenter · October 16, 2017, 8:00pm

Usually the CLT kicks in well before several hundred million observations unless you have long tail data and sparse structures or lots of subgroup structure. It doesn’t sound like you have that.

Michael_F · October 16, 2017, 8:11pm

I meant for the prediction step, I have several hundred million observations that I’d like the model to generate posterior predictions for that don’t have a dependent variable.

Bob_Carpenter · October 22, 2017, 2:15am

Not sure what you mean by dependent variable here, but you’re not going to have an easy time doing 100M+ predictions with the full posterior in Stan just due to output size. What you probably are going to need to do is compute running means (and or variances with Welford’s algorithm) and not even store all the draws for the predictive inferences.

Topic		Replies	Views
Mismatch when predicting inside and outside of STAN Modeling posterior-predictive	4	812	May 20, 2020
Help understanding code and generating estimates General	9	947	April 25, 2020
In-Sample Predictions in R with Stan Modeling	5	488	November 25, 2019
Bernoulli regression Modeling	2	355	July 25, 2019
Posterior prediction from moments of parameter coefficients Modeling	13	1031	February 27, 2018

Prediction using point estimates in rstan

Related topics