That was my impression too, unfortunately.

Here is the reduce function:

```
vector model_mr(vector mu_L,
vector beta_i,
real[] xa, int[] xi){
//fetch dims
int K = num_elements(beta_i);
int S = size(xa);
int n = S/K;
//set up lp components
real lp;
real ll;
//compute
lp = multi_normal_cholesky_lpdf(beta_i | mu_L[1:K], to_matrix(mu_L[(K+1):(K+K*K)],K,K));
ll = sum(-1*log(1+exp(to_matrix(xa,n,K) * beta_i)));
return [lp + ll]';
}
```

Pretty straightforward ‘individual-level’ of a hierarchical Logit model (where xa is differenced with respect to choice).

It is called like this in the model:

```
target += sum(map_rect(model_mr,
append_row(mu_beta, to_vector(chol_cov_beta)),
beta_i,
Xas, Xasint));
```

I see at least two issues here:

(1) `multi_normal_cholesky_lpdf`

does evaluate the constants, when it is not necessary. (my main question here)

(2) ‘ncol(beta_i)’ grows, `to_vector(chol_cov_beta)`

becomes really long, and this invokes a lot of inefficient copies. (Bob already commented on this here: https://github.com/stan-dev/math/issues/669) and this is not easy to work around, I believe.

Things I thought about:

(1) Chunk differently, i.e. not by respondent/unit but just chunk data into equal sizes closer to the number of cores used.

(2) The evaluation of lp could be done separately, and I could transform between std_normal-distributed random effects and beta_i. However, I have no idea how to do that in parallel… at least not without creating more overhead.

(3) Write my own version of a multinormal density