I use a Stan model to do inference over hundreds of thousands of small data parcels individually. I.e. for each parcel, the model runs and reports and I run as many as I can in parallel for speed ups. This is unsurprisingly slow however it does seem a lot faster than coding the data into a single model with N groups which have no shared parameters. I guess this is also unsurprising?
My main question is, given the context above, is there any recommendations for how I might go about making the application of the same model very many times faster? I can’t use optimizing in this instance (no continuous gradient) and vb is far too unreliable.