Speeding up a spherical harmonic model

I am attempting to optimise a model which uses spherical harmonics to parameterise the success probability of binomials on the sphere. Transforming from the alm parameters to the logit-probability x involves two steps which are computationally expensive (the bracketed lines in the transformed parameters block). This operation will be the building block of a much more complicated model, so I’d like to check that I haven’t missed a trick. Can anyone suggest a different way of calculating this expression which is more Stan-friendly?

data {
    int<lower=0> J;        // number of pixels
    int<lower=0> N;        // number of HEALPix isolatitude rings
    int<lower=0> H;        // number of harmonics
    int<lower=0> L;        // 2 * max l of hamonics + 1
    matrix[N,H] lambda;    // spherical harmonics decomposed
    matrix[J,L] azimuth;   // spherical harmonics decomposed
    int indices[J];        // map N->J
    int lower[L];          // map H->L
    int upper[L];          // map H->L
    int k[J];
    int n[J];
}
parameters {
    vector[H] alm;
}
transformed parameters {
    vector[J] x;
    {
        matrix[N,L] F;
        for (i in 1:L) {
            F[:,i] = lambda[:,lower[i]:upper[i]] * alm[lower[i]:upper[i]];
        }
        x = rows_dot_product(F[indices],azimuth);
    }
}
model {
    alm ~ std_normal();
    k ~ binomial_logit(n, x);
}
1 Like
matrix[N,L] F;
for (i in 1:L) {
  F[:,i] = lambda[:,lower[i]:upper[i]] * alm[lower[i]:upper[i]];
}

The only thing that sticks out to me is would it be possible to zero-pad the lambda matrix and build a matrix out of alm so that you can do:

matrix[N, L] F = lambda * alm_matrix; // alm_matrix would be an HxL matrix

Matrix multiplication is more efficient than matrix vector multiplication. The gains to big matrix operations are big enough that usually you can afford some zero padding even if multiplying by zero seems like kindof a waste.

I can see how writing it like you have it is more efficient in terms of operations though

3 Likes

As someone that doesn’t have much experience with the foundational compute side of things, things like this are super surprising and neat.

Thanks for taking a look! I tried something along the lines you suggested, swapping these lines:

for (i in 1:L) {
    F[:,i] = lambda[:,lower[i]:upper[i]] * alm[lower[i]:upper[i]];
}

for these lines:

matrix[H,L] Alm = rep_matrix(0.0, H, L);
for (i in 1:L) {
    Alm[lower[i]:upper[i],i] = alm[lower[i]:upper[i]];
}
F = lambda * Alm;

It turned out that my original code was the faster of the two, but that wasn’t obvious to me until I tried it!

Your original suggestion was to pad lambda, but I couldn’t find a way to make it work without turning it into a three dimensional object. Have I missed something?

2 Likes

Nah nah, the second thing is what I meant. Hmm, if that isn’t any faster then I think I’m out of ideas. What you have looks good to me.

By the way, there is new profiling stuff in cmdstan 2.26: Profiling Stan programs with CmdStanR • cmdstanr . It can be handy for things like this so you know what are the expensive parts of the code.

2 Likes

Amazing! Profiling has revealed that the line:

x = rows_dot_product(F[indices],azimuth);

takes six times longer than the lines I was trying to optimise.

3 Likes

Nice!

Things like this are why we were so eager to add profiling.

1 Like

I really appreciate it. I can’t imagine how difficult that must have been to implement!

2 Likes

Take a look at the rows in the rows dot product call to see if there’s any redundancy there. If so, see here on computing on just the unique rows then indexing into the result.