Fewer generated quantities than parameter samples?

This is a follow-up to Faster / better loading of sampler diagnostics in cmdstanr?.

I’ve successfully split out the generated quantities section from the model fitting, but now I’m wondering if it is possible to generate fewer samples via the $generated_quantities() call than iter_sampling originally supplied in the call to $sample(). The specific use case is that I might want more iterations to achieve convergence, but actually need fewer generated quantities in application. From looking at the source code, it doesn’t seem that there’s a way to do it, but on the off chance there is… If not, may I humbly submit a feature request?

Thanks again!

1 Like

In order to do that you need to write the draws to CSV, which you can do with the following function:

draws_to_csv <- function (draws) {
  # make dummy sampler diagnostics, needed because cmdstan requires those columns
  sampler_diagnostic_names <- c("accept_stat__", "stepsize__", "treedepth__", "n_leapfrog__", "divergent__", "energy__")
  n <- posterior::niterations(draws)
  n_chains <- posterior::nchains(draws)
  dummy_sampler_diag <- rep(0, n * length(sampler_diagnostic_names) * n_chains)
  dim(dummy_sampler_diag) <- c(n, n_chains, length(sampler_diagnostic_names))
  dummy_sampler_diag <- posterior::as_draws_array(dummy_sampler_diag)
  posterior::variables(dummy_sampler_diag) <- sampler_diagnostic_names
  
  # the columns must also be in order lp__, sampler_diagnostics, parameters
  variables <- posterior::variables(draws)
  draws <- posterior::subset_draws(
    posterior::bind_draws(draws, dummy_sampler_diag, along = "variable"),
    variable = c("lp__", sampler_diagnostic_names, variables[variables != "lp__"])
  )
  variables <- posterior::variables(draws)
  chains <- posterior::chain_ids(draws)
  # we generate file names for temporary CSV files
  paths <- cmdstanr:::generate_file_names(basename = "fittedParams", ids = chains)
  paths <- file.path(tempdir(), paths)
  chain <- 1
  for (path in paths) {
    # write iterations (required by cmdstan) and variables names
    write(
      paste0("# num_samples = ", n, "\n", paste0(cmdstanr:::unrepair_variable_names(variables), collapse = ",")),
      file = path,
      append = FALSE
    )
    utils::write.table(
      posterior::subset_draws(draws, chain = chain),
      sep = ",",
      file = path,
      col.names = FALSE,
      row.names = FALSE,
      append = TRUE
    )
    chain <- chain + 1
  }
  paths
}

Example of how to use:

library(cmdstanr)

stan_file_bernoulli <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
data_bernoulli <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.data.json")

stan_ppc_file_bernoulli <- "~/Desktop/cmdstanr/tests/testthat/resources/stan/bernoulli_ppc.stan"
data_ppc_bernoulli <- "~/Desktop/cmdstanr/tests/testthat/resources/data/bernoulli_ppc.data.json"

mod <- cmdstan_model(stan_file_bernoulli)
fit <- mod$sample(data_bernoulli, chains = 4, iter_sampling = 2000)


draws <- posterior::subset_draws(fit$draws(), iteration = 1:10)

mod_ppc <- cmdstan_model(stan_ppc_file_bernoulli)
fit_gq <- mod_ppc$generate_quantities(fitted_params = draws_to_csv(draws), data_ppc_bernoulli)
print(dim(fit_gq$draws())) #should be 10 4 11

All these gymnastics with the draws are required because cmdstan is very peculiar on how the CSV must be organized. But @mitzimorris is working on that (https://github.com/stan-dev/cmdstan/issues/943) so that should be much simpler soon.

3 Likes

Also thanks for the feedback!

Please open an issue if you feel fittedParams in $generate_quantities() should support the draws array. Based on your post I think it makes sense to support this.

2 Likes

Yeah I agree! @nerutenbeck If you want to open an issue with a feature request that would be great.

1 Like

I opened an issue here: https://github.com/stan-dev/cmdstanr/issues/342

Thanks @rok_cesnovar and @jonah!

3 Likes

As @mitzimorris pointed out, what I really need to do is remember that you control the number of iterations to achieve convergence with iter_warmup and there is no need to go overkill on iter_sampling. So specifying iter_sampling carefully plus use of posterior::thin_draws()will get the behavior I want.