I think we can do the same design principle, yes. So in essence:
- the gpu/mpi function gets called with the immutable data and a uid
- if the uid has not yet been seen, then the immutable data is distributed for MPI/sent to the GPU
- the distribution ensures that the uid will be recognized next time the function is called
- after the first call of the function we will assume that same uid is equivalent to same data
We will only end up having more and more singletons floating around in our code-base. I think this GPU stuff can even be nested in MPI calls when doing like that.
A GPU version of cov_exp_quad_cholesky should be super handy for GPs, I think.