Problem running loo_subsample with variational inference and cmdstanr

kdubovikov · May 15, 2021, 7:53am

I have assembled a minimal example to reproduce the performance issue.

Please download the following file (it contains R objects that are necessary to run the example) loo_subsample_example.RData - Google Drive

The code is

library(glue)
library(loo)

load('loo_subsample_example.RData')

llfun_hierarchical <- function(data_i, draws, log = TRUE) {
  year_mu <- glue("year_mu[{data_i$year}]")
  year_phi <- glue("year_phi[{data_i$year}]")
  days_mu <- glue("days_mu[{data_i$day}]")
  days_phi <- glue("days_phi[{data_i$day}]")
  hours_mu <- glue("hours_mu[{data_i$hour}]")
  hours_phi <- glue("hours_phi[{data_i$hour}]")
  
  #print(dim(draws))
  #print(draws[, c(year_mu, days_mu, hours_mu)])
  mu_sum <- rowSums(
    as.matrix(draws[, c(year_mu, days_mu, hours_mu)])
  )
  
  phi_sum <- rowSums(
    as.matrix(draws[, c(year_phi, days_phi, hours_phi)])
  )
  
  dnbinom(data_i$score, mu = mu_sum, size = phi_sum, log = log)
}

hierarchical_params <- as_draws_matrix(hierarchical_draws)

#loo_i(1, llfun_hierarchical, data = as.data.frame(data), draws = hierarchical_params)
log_p <- hierarchical_fit$lp()
log_g <- hierarchical_fit$lp_approx()

loo_hierarchical <- loo_subsample(llfun_hierarchical,
                          data = as.data.frame(data), 
                          draws = hierarchical_params, 
                          log_p = log_p,
                          log_g = log_g,
                          observations = 100,
                          cores = 25)

UPD answer to the next two posts below by @mans_magnusson

Sorry, I have reached a limit on the maximum number of replies here, so I can’t write more messages for the next 9 hours. Seems to be a rather strange restriction :(
Did not profile the code yet, I will look in that direction. Loglik seems to be the first bottleneck suspect. Is likelihood computation parallelized? If not, then we might have 200k iterations running in sequence.

UPD2
Just ran the profiler, here are the results:

glue takes some time, but I don’t think that that is the major issue. Maybe the problem is in the lapply? I think we could be able to parallelize this part to remove the bottleneck.

UPD 3
Here is this code loo/loo_subsample.R at e197aa7c5ac56881ad67dae3f90479af96d83c0a · stan-dev/loo · GitHub. I think that should be parallelizable without major issues. I see that parLapply is used elsewhere in the package and that might be a suitable drop-in replacement

UPD 4
Indeed, this was the problem. I have looked at how parallel computation was implemented here and swapped lapply with mclapply and the function finished in a few seconds. I might work on a PR for this. Here is a short example on the fix in loo_subsample.R:

# Compute the lpds
    lpds <- unlist(mclapply(X = seq_len(N), mc.cores = cores, FUN = function(i) {
      ll_i <- .llfun(data_i = data[i,, drop=FALSE], draws = point_est)
      ll_i <- as.vector(ll_i)
      lpd_i <- logMeanExp(ll_i)
      lpd_i
    }))

UPD 5

Sent a pull request here: Added parallel likelihood computation by kdubovikov · Pull Request #171 · stan-dev/loo · GitHub

Topic		Replies	Views
Extract log-likelihood function from rstanarm model Algorithms rstan , loo	3	1494	May 3, 2020
Additional arguments to loo_subsample() not passed RStan loo	3	632	May 3, 2021
Errors when using loo_subsample() Modeling rstan , loo	4	671	June 2, 2022
Problem running loo_subsample(), wheras loo() works Modeling loo , cmdstanr	2	376	November 29, 2023
Loo_subsample slower than loo? brms loo	14	202	May 27, 2025

Problem running loo_subsample with variational inference and cmdstanr

Related topics