Calculate log-predictive density

Hi.

I am trying to calculate the log-predictive density for my predictive distributions. Before I have calculated the RMSE and the percentiles to evaluate my model, but I would also like to calculate the log-predictive density. But I do not know how I should specify it and where to put it. I have the following code for my stan model and my for loop

write("// Stan model for simple linear regression

data {
 int < lower = 1 > N; // Sample size
// real<lower=0> nu;
 vector[N] x1; // Predictor
 vector[N] x2; // Predictor
 vector[N] y; // Outcome
 
 real x1_new;
 real x2_new;
}

parameters {
 real beta1; // Slope (regression coefficients)
 real beta2; // Slope (regression coefficients)
 real <lower = 0> sigma; // Error SD
 real <lower = 0> nu_raw; 
}
 
transformed parameters {
  real < lower = 2 > nu;
  nu = nu_raw + 2;
}

model {
 nu_raw ~ gamma(2, 0.1);
 y ~ student_t(nu, x1 + x2 * beta2, sigma);
}
generated quantities {
    real y_new;
    y_new = student_t_rng(nu, x1_new + x2_new * beta2, sigma);
 }
 // The posterior predictive distribution",

"stan_model1.stan")

stanc("stan_model1.stan")

stan_model1 <- "stan_model1.stan"

  
fit <- stan(file = stan_model1, data = stan_data, warmup = 2000, iter = 4000, chains = 4, cores = 2, thin = 1)

fit

kvaobs <- numeric(length(y))
kva_val95 <- numeric(length(y))
kva_val5 <- numeric(length(y))
rmse_val <- numeric(length(y))
for(i in 1:length(x1)){
  x1_new <- x1[i]
  x1_loo <- x1[-i]
  x2_new <- x2[i]
  x2_loo <- x2[-i]
  y_loo <- y[-i]
  stan_data <- list(N = length(y_loo), x1 = x1_loo, x1_new = x1_new, x2 = x2_loo, x2_new = x2_new, y = y_loo, nu = nu)
  
  # Modell
  mod <- stan(file = "stan_model1.stan", data = stan_data)
  # Get predictive distribution
  y_pred <- extract(object = mod, pars = "y_new")$y_new
  
  E_predicted <- mean(y_pred)
  
  # RMSE
  rmse_val[i] <- (E_predicted - y[i])^2
  
  #Percentiler
  kva_val5[i] <- quantile(y_pred, probs = 0.05)
  kva_val95[i] <- quantile(y_pred, probs = 0.95)
  kvaobs[i] <- ecdf(y_pred)(y[i])
  
}

Do I need to specify a new stan model and log the model? I have tried with a couple of things without success and therefore I would appreciate if someone have any advice how to solve this.

2 Likes

Hi,
the log-predictive density can be calculated by computing the density of the likelihood at your observed data given the parameters in the individual draws. You can compute it within your stan program as discussed e.g. at Writing Stan programs for use with the loo package • loo but you can also compute it in R - conceptually the R code would be something like:

for all samples s {
  for all data n {
    mu = x1[n] + x2[n] * beta2[s]
    log_lik[s, y] = dt(x = (y[n] - mu) / sigma[s], df = nu[s],  log = TRUE)
  }
}

(where I am scaling and shifting the posterior manuallz by sigma because R doesn’t have the scaled/shifted t distribution). Please check it is correct, I do make mistakes :-)

Best of luck with your model!

Great! Thanks for the input. I solved it a couple of days ago with the following code. My stan model:

write("// Stan model for simple linear regression

data {
 int < lower = 1 > N; // Sample size
// real<lower=0> nu;
 vector[N] x1; // Predictor
 vector[N] x2; // Predictor
 vector[N] y; // Outcome
 
 real x1_new;
 real x2_new;
 
 real y_new;
}

parameters {
 real beta1; // Slope (regression coefficients)
 real <lower = 0> sigma; // Error SD
 real <lower = 0> nu_raw;
 
}
 
transformed parameters {
  real < lower = 2 > nu;
  nu = nu_raw + 2;
}

model {
 nu_raw ~ gamma(2, 0.1);
 y ~ student_t(nu, x1 + x2 * beta1, sigma);
}
generated quantities {
    real log_lik;
    real y_pred;
    y_pred = student_t_rng(nu, x1_new + x2_new * beta1, sigma);
    log_lik = student_t_lpdf(y_new |nu, x1_new + x2_new * beta1, sigma);
 }
 // The posterior predictive distribution",

"stan_model1.stan")

And my loop for lpd, I have excluded the code for the other calculations.

for(i in 1:length(x1)){
  x1_new <- x1[i]
  x1_loo <- x1[-i]
  x2_new <- x2[i]
  x2_loo <- x2[-i]
  y_new <- y[i]
  y_loo <- y[-i]
  stan_data <- list(N = length(y_loo), x1 = x1_loo, x1_new = x1_new, x2 = x2_loo, x2_new = x2_new, 
                    y = y_loo, y_new = y_new, nu = nu)
  
  # Modell
  mod <- stan(file = "stan_model1.stan", data = stan_data)
  #lpd
  ll <- extract(object = mod, pars = "log_lik")$log_lik
  E_ll <- mean(ll)
  lpd[i] <- E_ll
  
 
  
}

1 Like