Hi. I’ve a two-staged HLM script for Stan and want to get the point estimates for in-sample data. The goal is to have a vector of in-sample predictions, so i can compare it with the actual data to compute the mean squared error. Seems to be the most intuitive way to compare the in-sample accuracy of my models.
Q1: How can I get the point estimates from a Stan model or with a R function?
Q2: This is a frequentist way to do it. Is there an alternative why to measure the predictive power of model in a Bayesian manner? If yes what are the methods?
Here’s a minimal viable code:
R script:
## Load Data
data("radon", package = "rstanarm")
data <- radon
# County Categorical Variable
county_name <- as.vector(data$county) # create vector of county indications for each obs
county_list <- unique(county_name) # create list of counties, drop duplicates
J <- length(county_list) # number counties in Minessota
county <- rep (NA, J)
for (i in 1:J){
county[county_name==county_list[i]] <- i
}
# Seperate variables
logradon <- data$log_radon
loguranium <- data$log_uranium
loguranium <- loguranium[!duplicated(loguranium)]
floorind <- data$floor
N <- length(logradon)
county_i <- county
county_j <- 1:85
# Lists for STAN
datalist <- list(Y = logradon,
X = floorind,
U = loguranium,
N = N,
J = J,
county_i = county_i,
county_j = county_j)
Stan script:
// HLM 2
data {
int<lower=1> N; // number of obs
int<lower=1> J; // Number of counties
vector[N] Y; // outcome data (log radon)
vector[N] X; // Inputa data Layer 1(room indication)
vector[J] U; // Input data Layer 2 (uranium)
int<lower=1> county_i[N]; //county ID variable for x
int<lower=1> county_j[J]; //county ID variable for u
}
parameters {
real beta; // fixed slope
real gamma_zero; // 2nd level fixed intercept
real gamma_one; // 2nd level fixed slope
real<lower=0> sigma_alpha; // prior on noise of 2nd layer model
real<lower=0> sigma_y;// prior on noise of 1st layer model
}
model { // Model specifications
vector[N] mu_y;
vector[J] alpha; // county j varying intercept
// 2nd Layer
for(j in 1:J)
alpha[county_j[j]] ~ normal(gamma_zero + gamma_one * U[county_j[j]], sigma_alpha);
// 1st Layer
for(i in 1:N)
mu_y[i] = alpha[county_i[i]] + beta * X[i];
// Priors and Hyperparams
Y ~ normal(mu_y, sigma_y);
sigma_y ~ cauchy (0, 2.5);
sigma_alpha ~ cauchy(0,2.5);
beta ~ normal(0, 5);
gamma_zero ~ normal(0, 5);
gamma_one ~ normal(0, 5);
}