Cost of Transferring Memory to a GPU Kernel

I’m wondering if anyone’s benchmarked the cost of transferring a block of memory, and how much, to a GPU kernel? The cost of the memory transfer.

I have some benchmarks on some autodiff on some complex models, and I’m wondering if it’s worth experimenting dumping the autodiff expression tree on a GPU.

I’d suppose not, because for some complex gradient computations the evaluation time is on the order of milliseconds, and I’m guessing the memory transfer will be more expensive than evaluating the expression tree… but yeah.

Any benchmarks on this?