I have a very large Stan multi-state mark recapture model (N = 200,000+). I calculate log-likelihoods for each individual by summing over all possible transitions for those individuals with missing observations. The resulting Stanfit object is very large, approximately 8 GB for 2000 post warmup iterations. I calculate log_lik in the generated quantities block, and the results of loo::extract_log_lik
is a 3 GB matrix.
The standard loo methods fail on me with the error:
“Error in serialize(data, node$con) : error writing to connection”
I imagine this is because I’m running out of memory and I ran into similar problems with earlier versions of loo
. Previously, I solved this by using the function method with a boringly trivial function in R:
ll_fun <- function(data_i, draws) {
data_i*draws
}
Where data_i
is actually the transposed log_lik matrix, (N x S) and draws is a S x 1 vector of 1’s. The reason I’ve taken this approach is that each individual log-likelihood relies on the relatively small data of an observed K-occasion capture history, but potentially a very large number of parameters (thousands). I have a pretty efficient algorithm to calculate the log-likelihood but it’s easier for me to simply write this as a block in the model section then it is to write a function which takes a large number of arguments. So I don’t have a custom Stan function to calculate the log-likelihoods . I mention this because in a similar thread on this topic, it was suggested that the function method of calculating loo is more memory efficient, but every example I’ve seen relies on taking the data and the draws of the parameters to calculate log_lik in R outside of Stan. I don’t want to do that as the list of matrices of parameters will be almost as large as the matrix of log_lik values, plus I already calculated log_lik and went to the trouble of storing it in memory.
This hack worked with loo
prior to version 2.0.0. However, this no longer works with loo 2.0.0. It runs for a bit with my CPU fan making some noise to let me know it’s working real hard. I then get a memory warning from Windows shortly before RStudio crashes. I’m running on a machine with 8 cores and 64 GB of RAM.
So my question is how do I hack the new version of loo to operate row-wise on the matrix of log_lik extracted from stanfit when memory is an issue?