# Manually manipulating log likelihoods for duplicate observations

I have a long likelihood calculation for a couple of integer features `tx` and `x`, that have a lot of duplicate observations. For example, I may have 1000 rows where tx=0 and x=0. I found that I can speed up the calculations a lot by just computing the log likelihood of the features once for each feature pattern (e.g., tx=0, x=0), and just multiply it by the number of observations with that pattern. This also requires reducing the size of the vector parameters `p` and `theta` by the same amount, having an entry for each feature pattern.

As an illustration, I converted something like the following model block that takes data with a large `N`:

``````model {
p ~ beta(alpha, beta);       // vector <lower=0,upper=1.0>[N] p;
theta ~ beta(gamma, delta);  // vector <lower=0,upper=1.0>[N] theta;

for (n in 1:N) {  // where N is big
real ll_lse;
ll_lse = long_log_likelihood(tx[n], x[n], theta[n], p[n]);
target += ll_lse;
}
}
``````

to (line `ll_lse *= n_custs[n]` added):

``````model {
p ~ beta(alpha, beta);
theta ~ beta(gamma, delta);

// where now, N is small and sum(n_custs) == previous N
for (n in 1:N) {
real ll_lse;
ll_lse = long_log_likelihood(tx[n], x[n], theta[n], p[n]);
ll_lse *= n_custs[n];
target += ll_lse;
}
}
``````

While I verified that target gets incremented by the same amount overall in both models from this block, I’m worried that I may not fully be accounting for the data transformation in the `p` and `theta` sampling statements.

I tried modifying `p ~ beta(alpha, beta);` to `target += beta_lpdf(p | alpha, beta) .* n_custs;`, but it appears that beta_lpdf returns a real instead of a vector.

I can un-vectorize it in a for-loop:

``````for (n in 1:N) {
target += beta_lpdf(p[n] | alpha, beta) * n_custs[n];
target += beta_lpdf(theta[n] | gamma, delta) * n_custs[n];
}
``````

but since I can’t print out the ‘target’ variable, I’m not sure if these programs are equivalent. Is there another factor I need to take into account for it to run on the reduced data?

You can always save and print a temporary variable so that’s one way to verify.

It’s available as `target()`.

1 Like