I have a long likelihood calculation for a couple of integer features `tx`

and `x`

, that have a lot of duplicate observations. For example, I may have 1000 rows where tx=0 and x=0. I found that I can speed up the calculations a lot by just computing the log likelihood of the features once for each feature pattern (e.g., tx=0, x=0), and just multiply it by the number of observations with that pattern. This also requires reducing the size of the vector parameters `p`

and `theta`

by the same amount, having an entry for each feature pattern.

As an illustration, I converted something like the following model block that takes data with a large `N`

:

```
model {
p ~ beta(alpha, beta); // vector <lower=0,upper=1.0>[N] p;
theta ~ beta(gamma, delta); // vector <lower=0,upper=1.0>[N] theta;
for (n in 1:N) { // where N is big
real ll_lse;
ll_lse = long_log_likelihood(tx[n], x[n], theta[n], p[n]);
target += ll_lse;
}
}
```

to (line `ll_lse *= n_custs[n]`

added):

```
model {
p ~ beta(alpha, beta);
theta ~ beta(gamma, delta);
// where now, N is small and sum(n_custs) == previous N
for (n in 1:N) {
real ll_lse;
ll_lse = long_log_likelihood(tx[n], x[n], theta[n], p[n]);
ll_lse *= n_custs[n];
target += ll_lse;
}
}
```

While I verified that target gets incremented by the same amount overall in both models from *this block*, I’m worried that I may not fully be accounting for the data transformation in the `p`

and `theta`

sampling statements.

I tried modifying `p ~ beta(alpha, beta);`

to `target += beta_lpdf(p | alpha, beta) .* n_custs;`

, but it appears that beta_lpdf returns a real instead of a vector.

I can un-vectorize it in a for-loop:

```
for (n in 1:N) {
target += beta_lpdf(p[n] | alpha, beta) * n_custs[n];
target += beta_lpdf(theta[n] | gamma, delta) * n_custs[n];
}
```

but since I can’t print out the ‘target’ variable, I’m not sure if these programs are equivalent. Is there another factor I need to take into account for it to run on the reduced data?