Hello all. I have been using Stan with great success to prototype and test a particular model with small simulated datasets. I’m now at a point where I would like to apply this model to datasets far larger than Stan was designed to deal with (feel free to correct me!) and was hoping for some pointers to the technical details underlying the current state of the dynamic HMC sampler Stan uses.
For reference, my model corresponds to an unusual regression problem with random residuals, random effects, and a random design matrix, with stochastic dependence between the latter two. The generative model is of the form:
where closed form expressions are available for the densities, and the marginal likelihood of X, even though it’s on a matrix-valued random variable, is easy to compute (fortunately). The issue is that the dimensions are large: if X is n\times m, I have n on the order of hundreds of thousands and m on the order of tens of millions.
To use apply model in practice, I expect to have to manually implement a dynamic HMC sampler (though again, feel free to disabuse me of this notion if you think a problem of this scale can be tackled using Stan) with the particularities of my problem in mind (e.g., replace deterministic matrix operations with randomized approximations, use explicit gradients rather than AD, etc…). With this in mind, I was hoping to find out:
Where can I find the technical details of the variety of dynamic HMC sampler Stan is currently using? I’ve read Betancourt’s wonderful introduction, but my understanding is there have been multiple tweaks to the sampler in the meantime–is there a document that presents the currently sampler in it’s entirety?
A little bit of reassurance that I’m not reinventing the wheel wouldn’t hurt :)