So I have a question about multithreading. My understanding is that the only way to make use of multithreaded NUTS sampling is to divide the data in to shards and use cmdstand + map_rect(). Correct?
In any event, I’ve noticed that when I use ADVI via the vb() function, my Mac’s Activity Monitor clearly shows it’s using multiple threads. When I use the stan() function, however, it compiles on a single core and runs only one thread—if I run multiple chains, I get one thread per chain.
I don’t know enough of the math libraries under the hood to figure this out, but why is it the vb() has multithreading support but Stan() using NUTS does not? I’m thinking that many applications (IRT, for example - something I do a lot of work in) would benefit from such embarrassingly parallelization.