That was my impression too, unfortunately.
Here is the reduce function:
vector model_mr(vector mu_L,
vector beta_i,
real[] xa, int[] xi){
//fetch dims
int K = num_elements(beta_i);
int S = size(xa);
int n = S/K;
//set up lp components
real lp;
real ll;
//compute
lp = multi_normal_cholesky_lpdf(beta_i | mu_L[1:K], to_matrix(mu_L[(K+1):(K+K*K)],K,K));
ll = sum(-1*log(1+exp(to_matrix(xa,n,K) * beta_i)));
return [lp + ll]';
}
Pretty straightforward ‘individual-level’ of a hierarchical Logit model (where xa is differenced with respect to choice).
It is called like this in the model:
target += sum(map_rect(model_mr,
append_row(mu_beta, to_vector(chol_cov_beta)),
beta_i,
Xas, Xasint));
I see at least two issues here:
(1) multi_normal_cholesky_lpdf
does evaluate the constants, when it is not necessary. (my main question here)
(2) ‘ncol(beta_i)’ grows, to_vector(chol_cov_beta)
becomes really long, and this invokes a lot of inefficient copies. (Bob already commented on this here: https://github.com/stan-dev/math/issues/669) and this is not easy to work around, I believe.
Things I thought about:
(1) Chunk differently, i.e. not by respondent/unit but just chunk data into equal sizes closer to the number of cores used.
(2) The evaluation of lp could be done separately, and I could transform between std_normal-distributed random effects and beta_i. However, I have no idea how to do that in parallel… at least not without creating more overhead.
(3) Write my own version of a multinormal density