I am attempting to optimise a model which uses spherical harmonics to parameterise the success probability of binomials on the sphere. Transforming from the alm parameters to the logit-probability x involves two steps which are computationally expensive (the bracketed lines in the transformed parameters block). This operation will be the building block of a much more complicated model, so I’d like to check that I haven’t missed a trick. Can anyone suggest a different way of calculating this expression which is more Stan-friendly?
data {
int<lower=0> J; // number of pixels
int<lower=0> N; // number of HEALPix isolatitude rings
int<lower=0> H; // number of harmonics
int<lower=0> L; // 2 * max l of hamonics + 1
matrix[N,H] lambda; // spherical harmonics decomposed
matrix[J,L] azimuth; // spherical harmonics decomposed
int indices[J]; // map N->J
int lower[L]; // map H->L
int upper[L]; // map H->L
int k[J];
int n[J];
}
parameters {
vector[H] alm;
}
transformed parameters {
vector[J] x;
{
matrix[N,L] F;
for (i in 1:L) {
F[:,i] = lambda[:,lower[i]:upper[i]] * alm[lower[i]:upper[i]];
}
x = rows_dot_product(F[indices],azimuth);
}
}
model {
alm ~ std_normal();
k ~ binomial_logit(n, x);
}
matrix[N,L] F;
for (i in 1:L) {
F[:,i] = lambda[:,lower[i]:upper[i]] * alm[lower[i]:upper[i]];
}
The only thing that sticks out to me is would it be possible to zero-pad the lambda matrix and build a matrix out of alm so that you can do:
matrix[N, L] F = lambda * alm_matrix; // alm_matrix would be an HxL matrix
Matrix multiplication is more efficient than matrix vector multiplication. The gains to big matrix operations are big enough that usually you can afford some zero padding even if multiplying by zero seems like kindof a waste.
I can see how writing it like you have it is more efficient in terms of operations though
Thanks for taking a look! I tried something along the lines you suggested, swapping these lines:
for (i in 1:L) {
F[:,i] = lambda[:,lower[i]:upper[i]] * alm[lower[i]:upper[i]];
}
for these lines:
matrix[H,L] Alm = rep_matrix(0.0, H, L);
for (i in 1:L) {
Alm[lower[i]:upper[i],i] = alm[lower[i]:upper[i]];
}
F = lambda * Alm;
It turned out that my original code was the faster of the two, but that wasn’t obvious to me until I tried it!
Your original suggestion was to pad lambda, but I couldn’t find a way to make it work without turning it into a three dimensional object. Have I missed something?
Nah nah, the second thing is what I meant. Hmm, if that isn’t any faster then I think I’m out of ideas. What you have looks good to me.
By the way, there is new profiling stuff in cmdstan 2.26: Profiling Stan programs with CmdStanR • cmdstanr . It can be handy for things like this so you know what are the expensive parts of the code.
Take a look at the rows in the rows dot product call to see if there’s any redundancy there. If so, see here on computing on just the unique rows then indexing into the result.