Using cmdstanr without returning the output to R?

helske · December 3, 2025, 6:56am

I have a model which results in csv files with sizes of tens of GB, This is fine, expect creating the resulting object in R is slow and sometimes impossible due to the size. However, I don’t need to process all the variables at once, so I realized that I can use the following to read in only subset of samples:

output <-read_cmdstan_csv(files = "samples.csv", variables = c("lp__", "x", "y"))
fit <- cmdstanr:::CmdStanMCMC_CSV$new(output, "samples.csv", FALSE)

(Not sure if there is cleaner way which doesn’t rely on the unexported functionality)

My question is that that when I run model$sample(...), is there a way to tell sample that it should not even try to create and return the results to R (which will cause error due to lack of memory)?

jsocolar · December 3, 2025, 2:22pm

I don’t think this functionality is anywhere exposed by cmdstanr, but I agree that you have a good use case for it and perhaps it would be a nice feature. If you want to use the package internals to do this, check out run_cmdstan here cmdstanr/R/run.R at master · stan-dev/cmdstanr · GitHub

cc @jonah

EDIT: This is wrong. See below.

huffyhenry · December 3, 2025, 4:19pm

I don’t think that $sample reads in all the samples by default. You have to call $draws for that. Perhaps it is the calculation of diagnostics that runs out of RAM? If so then you can set diagnostics = FALSE if you know what you’re doing.

If you have auxiliary variables for which you don’t need to save draws at all, define them in the model block or in a local block (an extra pair of { }) in transformed parameters.

Working with such large models with R and cmdstanr can be frustrating. Perhaps you will find my package Stanislaw useful. It extracts subsets of draws directly from CmdStan CSVs and can also calculate posterior summaries much faster than $summary.

jonah · December 3, 2025, 4:55pm

This is right, it shouldn’t read in all the draws until you ask it to do something that requires them (e.g. $draws(), $summary(), printing, etc.). For turning off reading in the diagnostics I would use diagnostics="" or diagnostics=NULL, although FALSE might also work, I haven’t tested it.

helske · December 3, 2025, 7:26pm

Thanks, indeed I was mixing my experiences with rstan fit object; the out of memory issue actually happens later in the batch jobs when using fit$save_object().

I prefer not to define these variables inside local {} or inside model block as I need them also later in generated quantities, although I could of course just recompute them as it probably doesn’t matter much in terms of the overall computing time.

jonah · December 3, 2025, 9:13pm

Ah, yeah that makes sense. save_object() will read everything into memory in order for it to be available when the object is loaded.

huffyhenry · December 4, 2025, 9:08am

I have this dilemma often. Yes it does not matter much in terms of time, but such duplicated code means bugs and it can rarely be refactored into a function. I’d love it if the language had a decorator that you could apply to a variable to exclude its draws from the CSVs.

jonah · December 4, 2025, 3:53pm

If I remember correctly, I think this is something @WardBrian had also expressed interest in, so maybe we can make that happen at some point.

WardBrian · December 4, 2025, 3:58pm

Yes, and I still am interested in that feature (the most recent idea was you would annotate the variable with @silent). If I remember correctly the primary concerns were that it interacts badly with things like standalone generated quantities - a model that had a silenced variable can’t have its results loaded back in for further processing by the same model. But that also seems like an obvious and “fair” tradeoff.

jonah · December 4, 2025, 5:08pm

I agree that’s a tradeoff worth accepting.

helske · December 4, 2025, 5:54pm

It would be great to have something like @silent in Stan code, I too find repeated the code in multiple places in order to avoid saving some auxiliary stuff annoying and “dangerous”.

However, disregarding the issue of extra variables in the output CSVs, I think there’s a simple solution for avoiding reading everything to R: Just include argument variables for as_cmdstan_fit() and pass it to read_cmdstan_csv() which already accepts variables argument?

Current definition of as_cmdstan_fit:

as_cmdstan_fit <- function(files, check_diagnostics = TRUE, format = getOption("cmdstanr_draws_format")) {
  csv_contents <- read_cmdstan_csv(files, format = format)
  switch(
    csv_contents$metadata$method,
    "sample" = CmdStanMCMC_CSV$new(csv_contents, files, check_diagnostics),
    "optimize" = CmdStanMLE_CSV$new(csv_contents, files),
    "variational" = CmdStanVB_CSV$new(csv_contents, files),
    "pathfinder" = CmdStanPathfinder_CSV$new(csv_contents, files),
    "laplace" = CmdStanLaplace_CSV$new(csv_contents, files)
  )
}

edit: Except it seems that model$sample() does not call as_cmdstan_fit(). Well, would at least help when manually creating the fit object for CSVs.

jonah · December 5, 2025, 4:38pm

I made a branch of the cmdstanr repo called "as_cmdstan_fit-with-variables" that adds the variables argument to as_cmdstan_fit():

github.com/stan-dev/cmdstanr

Allow selecting subset of variables when using `as_cmdstan_fit()`

master ← as_cmdstan_fit-with-variables

opened 04:36PM - 05 Dec 25 UTC

jgabry

+9 -2

#### Submission Checklist - [x] Run unit tests - [x] Declare copyright holde…r and agree to license (see below) #### Summary Adds `variables` argument to `as_cmdstan_fit()` to allow creating objects from a subset of variables in the CSV files. #### Copyright and Licensing Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): **Columbia University** By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

It would be great if you (or anyone else reading this) could test it out and let me know if it works for your use case.

robertgrant · December 12, 2025, 10:40am

I’m going to try this out as soon as I get a chance. Busy pre-Christmas though. Do we have alternative outputs to CSV? It hasn’t been a limit for me in the past but it is such a storage-inefficient format.

jonah · December 12, 2025, 5:32pm

Unfortunately we’re still only using CSV. There have been various proposals for other formats (which would need to be changed in CmdStan itself), but as far as I know we haven’t had a developer take on that project yet.

jonah · January 13, 2026, 9:09pm

This has now been merged into master

helske · January 14, 2026, 7:27am

Thanks, this is great, I was planning to test this out earlier but got distracted by other things.

Topic		Replies	Views
Cmdstanr fails to read its own csv files for large number of parameters CmdStan cmdstanr	21	2285	August 9, 2022
Saving of CmdStanModel objects from cmdstanr Other cmdstanr	7	2003	July 2, 2020
Save out subset of cmdstanr parameters (post-run solution) CmdStan	1	198	November 25, 2024
Importing large cmdstan csv-files to R General	25	3781	June 28, 2021
Import certain parameters from cmdstan .csv output files to Stanfit object General cmdstan , techniques	3	941	March 19, 2021

Using cmdstanr without returning the output to R?

Related topics