I was wondering if there is any way to automatically stop the stan() function if
it lasts too long? I am studying the time effect of different combinations of the parameters Kappa, Gamma and t0 on the NUT sampler for particular models. However, I want first to discard some possibilities which take too much time.
All the classical time tools in R don’t work… I didn’t find any of them which can kill a function if it takes to much time…
Well, you can abort the computation, but you don’t get anything back. You could start with a smaller number of iterations if you think the computation might take a long time, but the new diagnostics will (rightly) claim that the effective sample size in the tail (or bulk) is too low to make reliable inferences. Basically, you have to let it warm up long enough to get the adaptation pretty much right before you can figure anything out.
Tks you bgoodri :) It may sound strange but I’m not really interested in inferring.
I want to build a loop for which at each iteration the parameters (kappa,gamma,t0) change and stan() is run. And at each iteration, if stan() takes too much time (like much than one hour), I want to go to the next iteration and so on.
I am doing a simulation study. It is quite academic.
First I simulate data from a fully known random process (with known parameters values), something like 100 dataset simulated from that process.
Then I try to know using stan() (and testing different value of kappa, gamma, t0), how close I can be to these known parameters values of the random process.
The time is as well important for me. That why I want to discard some combinations of kappa, gamma and t0.
I already have noticed that some combinations take a lot of time for my model. And I don’t want to test all the combinations.
Fundamentally speaking the true problem is that there is no way to know the reasonable range of value that t0 and gamma can take. The theory behind these parameters is quite limited. That why I have to spend so much time using simulation…
BTW, if you’re doing this kind of thing on a cluster just use the csv file output option with either rstan or cmdstan and lean on the scheduler to kill jobs beyond a time limit. You can read the truncated .csv files with read.csv in R or data.table::fread if you strip the headers.
We encourage people to use simulation-based calibration to test (though we’re recommending more like 1000 simulations than 100) posterior coverage (aka calibration). But we want to generate the data from the priors and fit. Otherwise, we can’t say much about the model other than that sometimes it works and sometimes it doesn’t. Maybe you can plot the sets of parameter values where it doesn’t work.
The usual problem is priors that are unreasonably broad in that they generate unreasonable data that causes numerical issues with floating point (not conceptual issues with the algorithm per se).