[Edit: removed use of custom function beta_binomial2_lpmf, results remain unchanged]

I am trying to speed up a hierarchical *beta-binomial* regression through vectorization, but see much smaller performance improvements compared to when I use vectorization to speed up a *gaussian* model.

Focusing on the relevant code sections from the model block, here is the **original version of the gaussian model** (from a slightly modified brms-generated model):

```
vector[N] mu = temp_Intercept + Xc * b;
vector[N] sigma = temp_sigma_Intercept + rep_vector(0, N);
for (n in 1:N) {
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
sigma[n] += r_2_sigma_1[J_2[n]] * Z_2_sigma_1[n];
sigma[n] = exp(sigma[n]);
}
target += normal_lpdf(Y | mu, sigma);
```

and here is the **vectorized version of the gaussian model**, where `N_1`

is the umber of groups and `grpsz`

, `sidx`

, and `eidx`

are vectors with group sizes, and the first and last row indices for a group:

```
for (j in 1:N_1) {
vector[grpsz[j]] mu = temp_Intercept +
Xc[sidx[j]:eidx[j],] * b + r_1_1[j] +
r_1_2[j] * Z_1_2[sidx[j]:eidx[j]];
real sigma = exp(temp_sigma_Intercept + r_2_sigma_1[j]);
target += normal_lpdf(Y[sidx[j]:eidx[j]] | mu, sigma);
}
```

For a hierarchical gaussian regression with varying intercept, slope and error variance, (5 groups of 100 observations) the vectorized model samples 1.7 times faster then the original model (I ran each model 10 times with identical data and seeds). The speed gain grows as the number of observations per group increases.

Following the pattern above I also modified this **original version of a beta-binomial regression** (from brms):

```
vector[N] mu = temp_Intercept + Xc * b;
vector[N] phi = temp_phi_Intercept + rep_vector(0, N);
for (n in 1:N) {
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
phi[n] += r_2_phi_1[J_2[n]] * Z_2_phi_1[n];
phi[n] = exp(phi[n]);
mu[n] = inv_logit(mu[n]);
target += beta_binomial_lpmf(Y[n] | mu[n]*phi[n], (1-mu[n])*phi[n], trials[n]);
}
```

to this **vectorized version of a beta-binomial regression**:

```
for (j in 1:N_1) {
vector[grpsz[j]] mu = inv_logit(
temp_Intercept +
Xc[sidx[j]:eidx[j]] * b +
r_1_1[j] + r_1_2[j] * Z_1_2[sidx[j]:eidx[j]]);
real phi = exp(temp_phi_Intercept + r_2_phi_1[j]);
target += beta_binomial_lpmf(Y[sidx[j]:eidx[j]] | T , mu * phi, (1 - mu) * phi);
}
```

Differently than for the gaussian model, I donâ€™t see any speed up for the vectorized compared to the original beta-binomial model.

Am I doing something wrong here, or maybe vecorization in stan for `beta_binomial_lpmf`

is not implemented as efficiently as for `normal_lpdf`

?

Thanks in advance!

PS: I tested this on R with Rtools 3.4.0, rstan 2.17.3, and brms 2.4.0. Here is the relevant section of the sesson info:

R version 3.4.1 (2017-06-30)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)