Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups

MPI is coming, so the question is not silly at all. See here:

However, the first implementation can speed up things dramatically, but it is not truly user friendly since you have to cram your data into the format we expect it for easy distribution via MPI.