Vectorising user-defined multivariate lpdf/lpmf functions

fergusjchadwick · October 29, 2020, 12:55pm

Hi! I’m a fairly new Stan user and this is my first time posting to the forum so I hope I’m formulating everything correctly and apologies if not (also thank you to everyone who asks and answers questions on this group, it’s an amazing resource!).

I’m using @bgoodri’s excellent parameterisation of the multivariate probit model (original available here).

To make the program integrate more easily with library(loo) and the new reduce_sum parallelisation I tried to redefine the bulk of the model inside a single _lpmf function but I can’t work out how to vectorise the function to improve the computational performance.

The function as it stands is:

functions {
   real multiprobit_lpmf(
   int[] y,  // response data - integer array of 1s and 0s
   vector x, // covariate data
   matrix beta, // coefficients for covariates
   real[] u, // nuisance that absorbs inequality constraints
   int D,  // response dimension size
   matrix L_Omega // lower cholesky of correlation matrix
   ){
      vector[D] mu; // means
      vector[D] z; // latent continuous state
      vector[D] logLik; // log likelihood
      real prev;
      mu = beta * x;
      prev = 0;
     for (d in 1:D) { // Phi and inv_Phi may overflow and / or be numerically inaccurate
        real bound; // threshold at which utility = 0
        bound = Phi( -(mu[d] + prev) / L_Omega[d,d]  );
        if (y[d] == 1) {
          real t;
          t = bound + (1 - bound) * u[d];
          z[d] = inv_Phi(t);       // implies utility is positive
          logLik[d] = log1m(bound);  // Jacobian adjustment
        }
        else {
          real t;
          t = bound * u[d];
          z[d] = inv_Phi(t);     // implies utility is negative
          logLik[d]= log(bound);  // Jacobian adjustment
        }
        if (d < D) prev = L_Omega[d+1,1:d] * head(z, d);
        // Jacobian adjustments imply z is truncated standard normal
        // thus utility --- mu + L_Omega * z --- is truncated multivariate normal
      }
      return sum(logLik);
   }
}

And the data being used:

data {
  int<lower=1> K;  // number of covariates + intercept
  int<lower=1> D;  // response dimension size
  int<lower=0> N; // sample size 
  int<lower=0,upper=1> y[N,D]; // response data - integer array of 1s and 0s
  vector[K] x[N]; // covariate data 
}

At the moment the model is retrieving simulation parameters well but it would be nice to be able to remove the for loop, particularly when wrapping the function into partial_sums functions

model {
  L_Omega ~ lkj_corr_cholesky(1);
  to_vector(beta) ~ normal(0, 10);
  // implicit: u is iid standard uniform a priori
  { // likelihood
  for(n in 1:N){
        target += multiprobit_lpmf(y[n] | x[n], beta, u[n], D, L_Omega);
  }
  }
}

Any advice on how to vectorise the function (or if this is possible/desirable) is greatly appreciated- thanks!

bgoodri · October 29, 2020, 4:10pm

I am not optimistic that you can get much speedup from parallelization in this case because you have to solve for the first element of z, before you can solve for the second, third, … D-th.

fergusjchadwick · October 29, 2020, 4:20pm

Thanks for the quick reply. I am probably misunderstanding so apologies but the idea was to split the N rows (rather than the D columns) to parallelise and wouldn’t the \boldsymbol{Z_{n}}'s be conditionally independent?

bgoodri · October 30, 2020, 12:09am

They are conditionally independent in the statistical sense (before you condition on the data), but you still have to know the value of z_1 before you can solve for z_2, etc.

fergusjchadwick · October 30, 2020, 10:40am

I hadn’t realised that was how it worked, thank you!

Just in case anyone is interested, my experiment with reduce_sum just finished and the speed up is fairly minor as Ben predicted (~20% with 4 threads per chain, N=6266).

dpascall · October 30, 2020, 1:24pm

If you have a multivariate probit with D dimensions and N observations, surely you can still parallelise over the N observarions, even though the calculation for the likelihood over each of the dimensions within in a single observation depends on the previous one? i.e. if Y is a array of vectors, each of length D, I can’t see a reason you couldn’t paralellise across each of vectors in Y. I think that’s what @fergusjchadwick was asking.

fergusjchadwick · October 30, 2020, 1:35pm

Thanks @dpascall, that is exactly what I meant but I wasn’t phrasing very clearly!

fergusjchadwick · November 5, 2020, 1:58pm

@wds15, do you have any thoughts on whether this is a good candidate for paralellisation? I’ve managed to get a modest speed up and I’m wondering whether it’s worth pursuing it further or whether there is something intrinsic to the model structure (as suggested by Ben) that stops this from being a good parallelisation candidate.

The reduce_sum function that gave me the speed up is:

  real partial_sum(
      int[,] y_slice,
      int start,
      int end,
      vector[] x,
      matrix beta,
      real[,] u,
      int K,
      int D,
      matrix L_Omega
      ){
        // int slice_size;
        real ll[(end-start)+1];
        vector[K] x_slice[(end-start)+1];
        real u_slice[(end-start)+1, D];
        // slice_size = (end-start)+1;
        x_slice = x[start:end];
        u_slice = u[start:end];
        for(i in 1:(end-start)+1){
          ll[i] = multiprobit_lpmf(y_slice[i] | x_slice[i], beta, u_slice[i], D, L_Omega);
        }
      return sum(ll);  
      }

In particular, I was wondering if getting rid of the for loop inside the partial_sum function would improve things (and then how to do that!). Thanks for any input!!

wds15 · November 5, 2020, 3:44pm

If you can avoid the for loop and write it as one vectorize statement … do that!

Also: In case x is data, then do not assign it to a temporary variable, but rather use it directly with some additional index coding.

Finally, is u a parameter? If so, the slice over u instead of the data y.

fergusjchadwick · November 5, 2020, 5:31pm

Thanks @wds15. I’ve rewritten the code to slice over u (it is a parameter) and it’s currently running. Why is it better to slice over a parameter?

Great catch on the temporary variable, feel a bit silly for putting that in now!

I think to vectorise I’ll need to overhaul the code quite a lot (if indeed it’s possible). I’ll post here with the results of slicing over u and vectorising if I manage to!

wds15 · November 5, 2020, 5:37pm

Why its better to slice over parameters:

It‘s well worth trying to vectorize this thing!

fergusjchadwick · November 6, 2020, 3:37pm

@wds15 Thanks to your advice (removing temporary variables and slicing over u) the model is now twice as fast as the non-parallelised version. Here is my new function code for anyone trying to do similar:

    real partial_sum(
      real[,] u,
      int start,
      int end,
      vector[] x,
      matrix beta,
      int[,] y,
      int K,
      int D,
      matrix L_Omega
      ){
        real ll[(end-start)+1];
        for(i in start:end){
          ll[i-start+1] = multiprobit_lpmf(y[i] | x[i], beta, u[i-start+1], D, L_Omega);
        }
      return sum(ll);  
      }

Now to try vectorising!

CerulloE · November 7, 2020, 12:43am

Interesting, I am also using bgoodri parameterization of MVP - I have fairly large N and have tried to use reduce_sum around a month ago but got no speedup whatsoever . The version I’m currently using is a bit of an extension - it is a mixture model (so definitely cannot be vectorised as stan doesnt support vectorised mixtures yet), hierarchical, and the likelihood is a mixture of ordered probit and standard (binary) probit, so maybe that’s why it doesnt benefit as much. However, I dont recall trying to slice over u, only the data, so this might work.

fergusjchadwick · November 7, 2020, 8:12pm

@CerulloE, one thing that might be affecting it (you probably already know) is that you can only split across conditionally independent parts of the model so you’d need to find a way to split that reflects your hierarchy, if that’s possible.

Topic		Replies	Views
Vectorizing bernoulli_logit_lpmf() Algorithms	2	1617	July 12, 2017
Using _lupmf for multivariate likelihood in reduce_sum Modeling techniques , performance	7	838	July 15, 2022
Simulating from Multivariate Probit Modeling	9	811	March 16, 2021
Rewrite lpdf in transformed parameters for multivariate normal distribution Modeling	0	513	December 23, 2021
Outputting lpdf functions as vectors instead of reals Developers	6	930	January 3, 2022

Vectorising user-defined multivariate lpdf/lpmf functions

Related topics