In-Sample Predictions in R with Stan

razorlazor · November 22, 2019, 4:52pm

Hi. I’ve a two-staged HLM script for Stan and want to get the point estimates for in-sample data. The goal is to have a vector of in-sample predictions, so i can compare it with the actual data to compute the mean squared error. Seems to be the most intuitive way to compare the in-sample accuracy of my models.

Q1: How can I get the point estimates from a Stan model or with a R function?
Q2: This is a frequentist way to do it. Is there an alternative why to measure the predictive power of model in a Bayesian manner? If yes what are the methods?

Here’s a minimal viable code:

R script:

## Load Data

data("radon", package = "rstanarm")
data <- radon


# County Categorical Variable

county_name <- as.vector(data$county) # create vector of county indications for each obs
county_list <- unique(county_name) # create list of counties, drop duplicates
J <- length(county_list) # number counties in Minessota

county <- rep (NA, J) 
for (i in 1:J){
  county[county_name==county_list[i]] <- i
}


# Seperate variables
logradon <- data$log_radon
loguranium <- data$log_uranium
loguranium <- loguranium[!duplicated(loguranium)]
floorind <- data$floor
N <- length(logradon)
county_i <- county 
county_j <- 1:85


# Lists for STAN
datalist <- list(Y = logradon,
                 X = floorind,
                 U = loguranium,
                 N = N,
                 J = J,
                 county_i = county_i,
                 county_j = county_j)

Stan script:

// HLM 2

data {
	int<lower=1> N; // number of obs
	int<lower=1> J; // Number of counties
	vector[N] Y; // outcome data (log radon)
	vector[N] X; // Inputa data Layer 1(room indication)
	vector[J] U; // Input data Layer 2 (uranium)
	int<lower=1> county_i[N]; //county ID variable for x
	int<lower=1> county_j[J]; //county ID variable for u
}

parameters {
  real beta; // fixed slope
	real gamma_zero; // 2nd level fixed intercept 
	real gamma_one; // 2nd level fixed slope
	real<lower=0> sigma_alpha; // prior on noise of 2nd layer model
	real<lower=0> sigma_y;// prior on noise of 1st layer model
}

model { // Model specifications
  vector[N] mu_y;
  vector[J] alpha; // county j varying intercept 
  
   // 2nd Layer
	for(j in 1:J)
	alpha[county_j[j]] ~ normal(gamma_zero + gamma_one * U[county_j[j]], sigma_alpha);
  
  // 1st Layer
	for(i in 1:N)
	mu_y[i] = alpha[county_i[i]] + beta * X[i]; 
	
// Priors and Hyperparams
  Y ~ normal(mu_y, sigma_y);
  sigma_y ~ cauchy (0, 2.5);
  
  sigma_alpha ~ cauchy(0,2.5);
	
  beta ~ normal(0, 5);
  
  gamma_zero ~ normal(0, 5);
  gamma_one ~ normal(0, 5);
}

Ara_Winter · November 22, 2019, 7:32pm

Q1 In our Bayesian workflows we avoid point estimates like crazy. Typically report out the 50% uncertainty interval.

Q2 I run simulated data first through the model to make sure I can recover my true/known paramters first. Then when modelling I toss in the generated block of code like this:

generated quantities{
real Y_pred[N];
for(i in 1:N){
Y_pred[i] = normal_rng(alpha * (1 - exp(-(beta * (x[i]-x0)))), sigma); // Posterior predictive distribution
}
}

for my model ;) You will need something else in there.

Ara_Winter · November 22, 2019, 8:00pm

A couple of resources that help to address this:
Summary of Betancourt’s talk.

The full Betancourt treatment.
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

A paper on such things.

A super compact version.

razorlazor · November 22, 2019, 10:18pm

I tried this but it only works for my pooled, unpooled and single-level hlm models. For the code above i get very weird posterior predictive distributions that are worse than all the 3 other models (it should be the other way around according to this paper. Really strange…

Ara_Winter · November 25, 2019, 3:53pm

If I can get some time today I’ll run your model on my end and start taking a look to see what’s up.

razorlazor · November 25, 2019, 5:48pm

That would be nice. I get a posterior predictive distribution but it doesn’t seem to be better than the first model and even as bad as the unpooled and the pooled model - which I don’t understand. But then again, when calculating the RMSE, the RMSE of the HLM 2model is just slightly worse than from the HLM 1 model and better than unpooled and completely pooled. So what I got from that is that they’re different performance measures but I cannot really interpret the posterior predictive distribution as in - when is it good or when is it bad (given that the distributions of all models already look fairly good and the differences are tiny…)

Topic		Replies	Views
Prediction using point estimates in rstan Modeling	13	1672	October 22, 2017
Which Input for Bayesian R Squared when using STAN (not RSTANARM) Modeling techniques	2	734	July 12, 2018
Question: Stan Modelspecification for Hierarchical Linear Model (Gelman's Radon example) RStan rstan	13	1091	November 19, 2019
Workflow: Root Mean Squared Error Modeling	6	2014	June 4, 2018
Posterior Predictive Checks After Sampling Modeling	3	820	October 23, 2022

In-Sample Predictions in R with Stan

Related topics