I am working on a custom likelihood function, and it works, but it is very slow. The main reason is that the normalization constant to ensure the likelihood integrates to 1 is expensive to compute. I’ve sped it up as much as I can by developing an approximation, but it is still much slower than is reasonable. I’ve identified one major bottleneck in the way brms/stan computes the likelihood.

For any model, the likelihood is computed for every trial:

```
model {
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
for (n in 1:N) {
target += sdm_lpdf(Y[n] | mu[n]);
}
```

The likelihood function is of the form f(y, mu)/Z(mu), where Z(mu) is the integration normalization constant. Whereas the numerator f(y, mu) changes from trial to trial, the denominator Z(mu) is the same for all trials that have the same design matrix.

Thus, it would greatly speed things up, if Z(mu) is calculated only for trials for which the design matrix changes. Assume that we have 20 subjects and 1000 trials per subject, and that the data is sorted such that all trials from the same subject are in consequtive rows. Assume that this is the regression formula:

```
y ~ 1 + (1|id)
```

The approach I described will speed up the process by computing Z only 1 instead of a 1000 times for each subject. To achieve what I described, I currently have the following R code to adjust the model code:

```
stan_model <- "
real z;
for (n in 1:N) {
if (n == 1 || mu[n] != mu[n-1]) {
z = -sdm_ldenom_chquad_adaptive(mu[n]);
}
target += z;
}
"
stanvars <- stanvar(scode = stan_model, block = 'likelihood', position="end")
```

which results in the following model code

```
model {
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
// add numerator part of the likelihood
for (n in 1:N) {
target += sdm_lpdf(Y[n] | mu[n]);
}
real z; // variable to store denominator
for (n in 1:N) {
// check if the parameter is different from the previous trial and compute it if not
if (n == 1 || mu[n] != mu[n-1]) {
z = -sdm_ldenom_chquad_adaptive(mu[n]);
}
target += z;
}
```

This works nicely - the model recovers the correct parameters and runs much faster. I see two ways to speed it up further.

**1) Avoid a new loop**

Currently there is one for loop for computing the numerator, which is automatically generated by brms. Since I cannot insert custom code there, I had to create a new loop for calculating the denominator. Is there a way to insert my code in the existing loop that brms generates for the likelihood function? E.g. the result should be:

```
model {
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
real z; // variable to store denominator
for (n in 1:N) {
// add numerator part of the likelihood
target += sdm_lpdf(Y[n] | mu[n]);
// check if the parameter is different from the previous trial and compute it if not
if (n == 1 || mu[n] != mu[n-1]) {
z = -sdm_ldenom_chquad_adaptive(mu[n]);
}
target += z;
}
```

Would that even speed things up? Or does the compiler recognize that the loop is the same so that it doesn’t matter?

- Weight denominator by number of observations

With the current implementation, even if I do not recompute z, I still have an addition operation for every trial:

```
target += z;
```

Instead, it would be even better, for the code to recognize how many trials m have the same parameter values, and only once for each subject have:

```
target += z*m;
```

If we only have the 1 by-subject random intercept, this can be achieved with

```
real z;
real mu_s;
real w;
for (n1 in 1:N_1) {
mu_s = Intercept + r_1_1[n1];
z = -sdm_ldenom_chquad_adaptive(mu_s);
w = ....; // some function that calculates number of observations for the same subject (or passed as data from r)
target += z*w;
}
```

But for more complicated designs this code becomes much more complex

So at this point my question is: can what I described be achieved more easily? Is there a built in way in brms/stan that I might have missed that will accomplish this?

Beyond my specific use case, this in general will be useful for any likelihood function for which the normalization constant is expensive to compute (e.g. Conway-Maxwell-Poisson distribution · Issue #607 · paul-buerkner/brms · GitHub)