Map_rect and distributions

inferrator · February 25, 2020, 6:06pm

When using map_rect, we usually write a function that evaluate the ll/lp for a chunk of the data. That often requires evaluating some standard densities, like multi_normal_lpdf. However, we don’t really need constants, and using y ~ multi_normal would probably be faster. However, how do I use ~ inside my ‘reduce’ function?

EDIT:
Seems like this is already part of the developers discussion here Parallel autodiff v4

bbbales2 · February 26, 2020, 4:18pm

Just to be clear, I don’t think anyone is trying to solve that problem over there.

What are your arguments to multi_normal_lpdf and which are data and which are computed as a function of parameters?

inferrator · February 26, 2020, 4:36pm

That was my impression too, unfortunately.

Here is the reduce function:

 vector model_mr(vector mu_L, 
                  vector beta_i,
                  real[] xa, int[] xi){

  //fetch dims
  int K = num_elements(beta_i);
  int S = size(xa);
  int n = S/K;
  
  //set up lp components
  real lp;
  real ll;

  //compute
  lp = multi_normal_cholesky_lpdf(beta_i | mu_L[1:K], to_matrix(mu_L[(K+1):(K+K*K)],K,K));
  ll = sum(-1*log(1+exp(to_matrix(xa,n,K) * beta_i)));

  return [lp + ll]';
}

Pretty straightforward ‘individual-level’ of a hierarchical Logit model (where xa is differenced with respect to choice).

It is called like this in the model:

 target += sum(map_rect(model_mr,
                append_row(mu_beta, to_vector(chol_cov_beta)),
                beta_i,
                Xas, Xasint));

I see at least two issues here:
(1) multi_normal_cholesky_lpdf does evaluate the constants, when it is not necessary. (my main question here)
(2) ‘ncol(beta_i)’ grows, to_vector(chol_cov_beta) becomes really long, and this invokes a lot of inefficient copies. (Bob already commented on this here: https://github.com/stan-dev/math/issues/669) and this is not easy to work around, I believe.

Things I thought about:
(1) Chunk differently, i.e. not by respondent/unit but just chunk data into equal sizes closer to the number of cores used.
(2) The evaluation of lp could be done separately, and I could transform between std_normal-distributed random effects and beta_i. However, I have no idea how to do that in parallel… at least not without creating more overhead.
(3) Write my own version of a multinormal density

bbbales2 · February 26, 2020, 4:43pm

It looks like chol_cov_beta is a function of parameters, so we’d do all the extra expensive stuff anyway (computing the determinant).

If chol_cov_beta was data (just something you passed in), the constants should still be cheap for multi_normal_cholesky cause the determinant of the Cholesky factor is just the product of its diagonal elements.

So I think you’re fine here.

bbbales2 · February 26, 2020, 4:43pm

Yeah, that’d be the way to verify this is fast.

inferrator · February 26, 2020, 5:14pm

Everything’s relative.

Suppose beta_i is an array with 1,000 or 10,000 elements. While the computation of chol_cov_beta is expensive, it only is one computation. However, passing its result 1,000 or 10,000 times while creating copies isn’t going to help performance. The same is true for evaluating unnecessary constants 1,000 times.

bbbales2 · February 26, 2020, 6:37pm

Yeah fair. Custom normal lpdf makes sense then I think. Pass in the determinant as a scalar.

bbbales2 · February 26, 2020, 6:38pm

Well they aren’t unnecessary. But maybe there’s unnecessary duplicate computation.

Topic		Replies	Views
Map_rect in the Stan language (MPI is coming) Developers	21	2669	July 14, 2020
Using _lupmf for multivariate likelihood in reduce_sum Modeling techniques , performance	7	837	July 15, 2022
Rewrite lpdf in transformed parameters for multivariate normal distribution Modeling	0	512	December 23, 2021
Map_rect, multithreaded on parameters only Algorithms	2	690	November 12, 2018
Vectorized use of multi_normal_lpdf Modeling performance	3	464	February 5, 2023

Map_rect and distributions

Related topics