Calculate log-predictive density

manneolfvingsson · April 30, 2021, 9:16am

Hi.

I am trying to calculate the log-predictive density for my predictive distributions. Before I have calculated the RMSE and the percentiles to evaluate my model, but I would also like to calculate the log-predictive density. But I do not know how I should specify it and where to put it. I have the following code for my stan model and my for loop

write("// Stan model for simple linear regression

data {
 int < lower = 1 > N; // Sample size
// real<lower=0> nu;
 vector[N] x1; // Predictor
 vector[N] x2; // Predictor
 vector[N] y; // Outcome
 
 real x1_new;
 real x2_new;
}

parameters {
 real beta1; // Slope (regression coefficients)
 real beta2; // Slope (regression coefficients)
 real <lower = 0> sigma; // Error SD
 real <lower = 0> nu_raw; 
}
 
transformed parameters {
  real < lower = 2 > nu;
  nu = nu_raw + 2;
}

model {
 nu_raw ~ gamma(2, 0.1);
 y ~ student_t(nu, x1 + x2 * beta2, sigma);
}
generated quantities {
    real y_new;
    y_new = student_t_rng(nu, x1_new + x2_new * beta2, sigma);
 }
 // The posterior predictive distribution",

"stan_model1.stan")

stanc("stan_model1.stan")

stan_model1 <- "stan_model1.stan"

  
fit <- stan(file = stan_model1, data = stan_data, warmup = 2000, iter = 4000, chains = 4, cores = 2, thin = 1)

fit

kvaobs <- numeric(length(y))
kva_val95 <- numeric(length(y))
kva_val5 <- numeric(length(y))
rmse_val <- numeric(length(y))
for(i in 1:length(x1)){
  x1_new <- x1[i]
  x1_loo <- x1[-i]
  x2_new <- x2[i]
  x2_loo <- x2[-i]
  y_loo <- y[-i]
  stan_data <- list(N = length(y_loo), x1 = x1_loo, x1_new = x1_new, x2 = x2_loo, x2_new = x2_new, y = y_loo, nu = nu)
  
  # Modell
  mod <- stan(file = "stan_model1.stan", data = stan_data)
  # Get predictive distribution
  y_pred <- extract(object = mod, pars = "y_new")$y_new
  
  E_predicted <- mean(y_pred)
  
  # RMSE
  rmse_val[i] <- (E_predicted - y[i])^2
  
  #Percentiler
  kva_val5[i] <- quantile(y_pred, probs = 0.05)
  kva_val95[i] <- quantile(y_pred, probs = 0.95)
  kvaobs[i] <- ecdf(y_pred)(y[i])
  
}

Do I need to specify a new stan model and log the model? I have tried with a couple of things without success and therefore I would appreciate if someone have any advice how to solve this.

martinmodrak · May 6, 2021, 9:41am

Hi,
the log-predictive density can be calculated by computing the density of the likelihood at your observed data given the parameters in the individual draws. You can compute it within your stan program as discussed e.g. at Writing Stan programs for use with the loo package • loo but you can also compute it in R - conceptually the R code would be something like:

for all samples s {
  for all data n {
    mu = x1[n] + x2[n] * beta2[s]
    log_lik[s, y] = dt(x = (y[n] - mu) / sigma[s], df = nu[s],  log = TRUE)
  }
}

(where I am scaling and shifting the posterior manuallz by sigma because R doesn’t have the scaled/shifted t distribution). Please check it is correct, I do make mistakes :-)

Best of luck with your model!

manneolfvingsson · May 6, 2021, 12:13pm

Great! Thanks for the input. I solved it a couple of days ago with the following code. My stan model:

write("// Stan model for simple linear regression

data {
 int < lower = 1 > N; // Sample size
// real<lower=0> nu;
 vector[N] x1; // Predictor
 vector[N] x2; // Predictor
 vector[N] y; // Outcome
 
 real x1_new;
 real x2_new;
 
 real y_new;
}

parameters {
 real beta1; // Slope (regression coefficients)
 real <lower = 0> sigma; // Error SD
 real <lower = 0> nu_raw;
 
}
 
transformed parameters {
  real < lower = 2 > nu;
  nu = nu_raw + 2;
}

model {
 nu_raw ~ gamma(2, 0.1);
 y ~ student_t(nu, x1 + x2 * beta1, sigma);
}
generated quantities {
    real log_lik;
    real y_pred;
    y_pred = student_t_rng(nu, x1_new + x2_new * beta1, sigma);
    log_lik = student_t_lpdf(y_new |nu, x1_new + x2_new * beta1, sigma);
 }
 // The posterior predictive distribution",

"stan_model1.stan")

And my loop for lpd, I have excluded the code for the other calculations.

for(i in 1:length(x1)){
  x1_new <- x1[i]
  x1_loo <- x1[-i]
  x2_new <- x2[i]
  x2_loo <- x2[-i]
  y_new <- y[i]
  y_loo <- y[-i]
  stan_data <- list(N = length(y_loo), x1 = x1_loo, x1_new = x1_new, x2 = x2_loo, x2_new = x2_new, 
                    y = y_loo, y_new = y_new, nu = nu)
  
  # Modell
  mod <- stan(file = "stan_model1.stan", data = stan_data)
  #lpd
  ll <- extract(object = mod, pars = "log_lik")$log_lik
  E_ll <- mean(ll)
  lpd[i] <- E_ll
  
 
  
}

Topic		Replies	Views
How to compute the predictive likelihood using stan General	35	2336	May 15, 2019
Use Posterior_predict in rstanarm to generate probabilites for each observation in a logistic regression model Modeling	6	1352	February 22, 2019
Calculating elpd manually for observations with high pareto k diagnostic Modeling loo	2	697	December 17, 2021
Positive values for elpd_loo? Modeling cognitive-science	4	1843	June 12, 2020
Sample steps General	8	636	February 21, 2024

Calculate log-predictive density

Related topics