Nystrom approximation slows down Gaussian process regression?

Sorting is unlikely to be a bottleneck—it’s indeed O(N log N) as we just use the C++ sort function and there’s no autodiff overhead.

That’s a very clever approach to data subsampling. The lower and upper bounds are critical here. Not something we really intended to let slip through, because if you subsample the data, you’re not going to converge. Michael also wrote an arXiv paper on why it’s a bad idea in general to subsample for HMC.

The problem one normally runs into with this kind of hack to get discrete sampling is that there is no information flow from idx back to u (it’s like a “cut” in BUGS). But here you don’t care about that info flow, you really want random sampling, which this should do.