Yes it is a dataset of 50,000 groups which I am working my way using hierarchical modelling but I am really just interested about the population parameters and maybe a few individual parameters to show some individual predictions. The issue is that in using the non-centered parameterisation in the transformed parameters block and generated quantities it generates a bunch of individual parameters.
Yes I know but I am using cmd_stan because I am working with map_rect.
Thanks for the feedback anyway!
There havenāt been any posts here for a long time, but I think the issue still exists. I have been using Stan a lot to analyze RNA-sequencing data and since I also save the log_lik for model comparison, I have files which go up to 18gb. As suggested higher up, I preprocess the files with bash (through system calls from R), and then read in the relevant preprocessed, smaller files with fread using Rscript to avoid the memory issues in Rstudio. I was wondering whether there would be any interest to formalize this a bit more and make it available in form of an R package for the analysis of large cmdstan files (including bayesplot and loo for example). I would be happy to participate. I already have the custom routines for my own project, but I think it would be great to generalize them. Since Stan can be parallelized, sampling time has become less of an issue, but large files still cause memory problems in R.
That sounds like it would be a great contribution, but before you dive in maybe take a look at what Iāve done here; I just spent the weekend getting during-sampling cmdstanCSV-to-NetCDF4 working. Iām still working on the access scheme after sampling and also hoping to be able to eventually truncate the CSVs as theyāre being read for better storage efficiency. (Note also the important omissions at present.)
Thanks for the prompt answer. That doesnāt quite seem to be what I have in mind. I am assuming that storage is not an issue or at least not a major issue. In that case it would be sufficient to process the cmdstan output, no modification to Stan would be required. Maybe the main problem is that Stan writes all output into a single file. Instead of processing the cmdstan output, one could maybe add an option to Stan that allows to save certain variables (like log_lik) in separate files. Then, the files should usually have a size that can be processed by R using fread.
Ah, then yes that gets into modifying how cmdstan writes out data. There has been substantial discussion on that, and while everyone agrees that the current scheme is far from ideal, thereās very little agreement on what to do as an alternative. @mitzimorris mentioned in a recent Stan Gathering that there might be something on the horizon in this domain though.
Iāll note that in the case of limited storage space, my proposal linked above to allow the csv files to be truncated would largely solve that, since the program consuming the csv could even use user-supplied keep/toss-list if only a subset of quantities should be saved.