Map_rect and distributions

When using map_rect, we usually write a function that evaluate the ll/lp for a chunk of the data. That often requires evaluating some standard densities, like multi_normal_lpdf. However, we don’t really need constants, and using y ~ multi_normal would probably be faster. However, how do I use ~ inside my ‘reduce’ function?

EDIT:
Seems like this is already part of the developers discussion here Parallel autodiff v4

3 Likes

Just to be clear, I don’t think anyone is trying to solve that problem over there.

What are your arguments to multi_normal_lpdf and which are data and which are computed as a function of parameters?

That was my impression too, unfortunately.

Here is the reduce function:

 vector model_mr(vector mu_L, 
                  vector beta_i,
                  real[] xa, int[] xi){

  //fetch dims
  int K = num_elements(beta_i);
  int S = size(xa);
  int n = S/K;
  
  //set up lp components
  real lp;
  real ll;

  //compute
  lp = multi_normal_cholesky_lpdf(beta_i | mu_L[1:K], to_matrix(mu_L[(K+1):(K+K*K)],K,K));
  ll = sum(-1*log(1+exp(to_matrix(xa,n,K) * beta_i)));

  return [lp + ll]';
}

Pretty straightforward ‘individual-level’ of a hierarchical Logit model (where xa is differenced with respect to choice).

It is called like this in the model:

 target += sum(map_rect(model_mr,
                append_row(mu_beta, to_vector(chol_cov_beta)),
                beta_i,
                Xas, Xasint));

I see at least two issues here:
(1) multi_normal_cholesky_lpdf does evaluate the constants, when it is not necessary. (my main question here)
(2) ‘ncol(beta_i)’ grows, to_vector(chol_cov_beta) becomes really long, and this invokes a lot of inefficient copies. (Bob already commented on this here: https://github.com/stan-dev/math/issues/669) and this is not easy to work around, I believe.

Things I thought about:
(1) Chunk differently, i.e. not by respondent/unit but just chunk data into equal sizes closer to the number of cores used.
(2) The evaluation of lp could be done separately, and I could transform between std_normal-distributed random effects and beta_i. However, I have no idea how to do that in parallel… at least not without creating more overhead.
(3) Write my own version of a multinormal density

It looks like chol_cov_beta is a function of parameters, so we’d do all the extra expensive stuff anyway (computing the determinant).

If chol_cov_beta was data (just something you passed in), the constants should still be cheap for multi_normal_cholesky cause the determinant of the Cholesky factor is just the product of its diagonal elements.

So I think you’re fine here.

Yeah, that’d be the way to verify this is fast.

Everything’s relative.

Suppose beta_i is an array with 1,000 or 10,000 elements. While the computation of chol_cov_beta is expensive, it only is one computation. However, passing its result 1,000 or 10,000 times while creating copies isn’t going to help performance. The same is true for evaluating unnecessary constants 1,000 times.

1 Like

Yeah fair. Custom normal lpdf makes sense then I think. Pass in the determinant as a scalar.

Well they aren’t unnecessary. But maybe there’s unnecessary duplicate computation.