Making function more efficient?

Hello! I am currently fitting a bunch of Hilbert Space Gaussian Processes (HSGPs) with a partial pooling (similar to this PyMC example).

I am adopting/extending @avehtari Birthday example code to allow for fitting many HSGPs at once.

My question is how to “best” improve the defined diagSPD_periodic() function to make it “vectorized”.

The current function accounts for a single sigma and a single lengthscale as follows:

  vector diagSPD_periodic(real sigma, real lengthscale, int M) {
    real a = inv_square(lengthscale);
    vector[M] q = exp(log(sigma) + 0.5 * (log(2) - a + to_vector(log_modified_bessel_first_kind(linspaced_int_array(M, 1, M), a))));
    return append_row(q,q);
  }

but now I would like to pass in a vector of sigma and a vector of lengthscale and return a matrix corresponding to the computation for each integer against each element of the vector.

My attempt is:

  matrix diagSPD_periodic_M(vector sigma, vector lengthscale, int M) {
    int num_gps = num_elements(sigma);
    vector[num_gps] a = inv_square(lengthscale);
    matrix[num_gps, M] linspaced_matrix = rep_matrix(linspaced_row_vector(M, 1, M), num_gps );
    matrix[num_gps , M] expanded_parameter = rep_matrix(a,M);
    matrix[num_gps ,M] q = exp(
      log(rep_matrix(sigma,M)) + 0.5 * (log(2) - expanded_parameter + log_modified_bessel_first_kind(linspaced_matrix, expanded_parameter))
    );
    return append_col(q,q);
  }

but it still seems inefficient?

My quesiton is: is this the best way to “vectorize” this function?

NOTE: I would also be willing to augment the C++ code to allow for passing in containers of different lengths and returning a matrix of the appropriate size (for the above code I adopted the code from the manual here)

I don’t know that, but want to check if you really need different sigmas and lengthscales? sigmas and lengthscales define the prior on smoothness, but the posterior smoothness can be different, and thus it is perfectly fine to use the same prior for many functions unless you assume very big differences in smoothness. As sigmas and lengthscales are weakly informed by the likelihood, the differences in smoothness need to be really big before there would be benefit in using different sigmas and lengthscale. Using the same GP prior for many functions would make the code much faster, too.

1 Like

I refactored down to this:

matrix diagSPD_periodic_M(vector sigma, vector lengthscale, int M) {
  int N = num_elements(sigma);
  vector[N] a = inv_square(lengthscale);
  matrix[N, M] linspaced_matrix = rep_matrix(linspaced_row_vector(M, 1, M), N);
  matrix[N, M] expanded_parameter = rep_matrix(log(2) - a, M);
  matrix[N, M] q
      = sigma
      * sqrt(exp(expanded_parameter
                 + log_modified_bessel_first_kind(linspaced_matrix,
                                                  expanded_parameter)
                 )
            );
  return append_col(q, q);
}

I used that exp(a + b) = exp(a) * exp(b) and that exp(0.5 * a) = exp(a)^0.5 = sqrt(exp(a)). I’m not sure the second step helps—I was hoping to reduce further.

I’ve never used our Bessel functions and the doc seemed a bit sketchy on what vectorized meant in this case. But I always worry when there is a matrix fill and then an operation—it’s usually better to pull that into the fill, as I did with the log(2) in expanded_parameter above. A lot of our vectorized functions can just take a scalar and will broadcast it up virtually if it needs to be reused, which is more efficient, especially when there is autodiff involved. So I was wondering if you could somehow operate on the row vector and then broadcast that up to q.

But however you slice this, all the Bessel functions will be costly.

2 Likes

Thanks for this Bob! I wondered about the fill too. I think a long time ago I asked how the usage of functions like rep_vector, rep_matrix, etc. affected the autodiff stack and if I remember correctly, it puts a bunch of copies on there. So I initially wanted to try to do this with just a call to the Bessel function using array and vector containers for linspaced_matrix and a, respectively. That, of course, would have required a change to the API of log_modified_bessel_first_kind though, so I think the matrix approach is the best we can do.

In any case, I am using Aki’s suggestion of reusing the same basis functions for all of the HSGPs but questions like these really help my understanding of Stan’s autodiff!

Thanks!

1 Like

Yes, it puts a bunch of copies into the matrix. The upside is that the copies are only pointers (technically instances of the PIMPL pattern in the C++ code), so there aren’t any autodiff variables being copied. It’s just more pointer chasing and indirection in the arithmetic and a quadratically sized intermediate.

What we really need to do is turn these construction functions into comprehensions so that we can assign to our more efficient underlying matrix implementation.