This is a good point. This might be a bit niggly, but I think what people are specifying is that the operation is safe to perform in parallel; i.e. input is required from the user to know that the code does not have have data interdependencies between iterations. In an ideal world, I think the compiler would figure this out, but this is a very tricky problem and compilers need to be conservative in the face of uncertainty.
In the case of GPU functions, there in no correctness at stake - just some differences in performance.
Advanced users always want the abstraction to leak :P Our job is to navigate that tradeoff and pick something we think makes sense for most of our users.
I think a flag deals with 3, and for 5 I think we could do something like you mentioned below, adding
-D STAN_GPU. A flag also makes the error handling of the API simpler, equivalent to the
_gpu() case you write about. For whatever that's worth. Agreed that failing early is pretty much always preferred.
@Bob_Carpenter, what do you think about this
_gpu() thing vs some kind of
-D STAN_GPU compiler flag? My perspective seems to lean relatively heavily towards simple language+complicated compiler compared with Dan and maybe with you as well, but it's up to you.
Sorry, discourse somehow totally omitted your reply from ~20 days ago when I replied to Dan yesterday. It only appeared this morning, and then between my reply and Dan's. Pretty weird. Anyway, I was mostly saying I'd be willing to take over just in case you had started work and found that to be time consuming (something that would be totally understandable!). I'm happy to be your point person and help guide you through the remaining issues, though as you can see some of the issues require some community discussion and decision-making.
I think we'd love to get GPU support integrated as much as we can while retaining existing performance and behavior for users who don't have GPUs. Though it sounds like from a project management perspective it's probably good to talk to Rok and see if their plans will totally eclipse this at some point within a short timeframe, and if you still want to put the effort into code that might get replaced at some point. I think certainly some of our users would find incremental advances in performance fairly valuable.