Slow Estimation of Log Likelihood

Hi all,

I am currently trying to implement a grid search on my hyperparameters. I am using loo as the criterion for model fit.
To calculate loo, I am generating log likelihood estimates in the generated quantities, like so:

generated quantities {
vector[N] log_lik;
for (n in 1:N){
log_lik[n] <- bernoulli_logit_lpmf(y[n] | w0[jj[n]] + w1[jj[n]] * s[n] - beta[ii[n]]);
}

variable lengths:
N=4388
w0=20
w1=20
beta=222

chains:2
iterations=1000

Unfortunately, estimation of the log_lik is very slow. Is there a way to accelerate this, for instance by skipping iterations?

Hi, what interface do you use?

log_lik shape is 2000, 4388

So it might be faster to do outside Stan.

I currently use rstan. So you mean taking the model fit and generating log_lik within R itself could be faster?

Yes, because moat of the time is probably used for moving that matrix from Stan (C++) to R.

That has only 8e6 points, so I don’t think eval takes too long.

thanks a lot!! So it does run much faster now that I do it within R. My only trouble is that it
gives me slightly different values from what I get from the generated values block. While many values overlap, some deviate. The means are -0.5345751 [generated values] vs. -0.5897133 [R].

Any idea why that would be? Using Using Leave-one-out cross-validation for large data • loo as an example, here’s what I did:

X = list(
ii = as.factor(d$ii), # 222
jj = as.factor(d$jj), # 41
s = as.factor(d$s)) # 2
%>% as.data.frame() %>% model.matrix(~ii+jj+s:jj-1,.,contrasts.arg = lapply(., contrasts, contrasts=FALSE)) # -1: no intercept

standf = list(y=d$y,X=X) %>% as.data.frame

draws = list(
beta = extract(model)$beta,
w0 = extract(model)$w0,
w1 = extract(model)$w1
)

llfun_long_rasch ← function(data_i, draws) {
x_df = data_i[, which(grepl(colnames(data_i), pattern = “X”))]
ii_ind = which(grepl(colnames(x_df), pattern = “X.ii”))
jj_ind = which(grepl(colnames(x_df), pattern = “X.jj[0-9][0-9]?$”))
s1jj_ind = which(grepl(colnames(x_df), pattern = “X.jj[0-9][0-9]?.s1”))
x_i = as.matrix(x_df)

logit_pred ← draws$w0 * t(x_i)[jj_ind,] + draws$w1 * t(x_i)[s1jj_ind,] - draws$beta * t(x_i)[ii_ind,] # I use %*% instead of * in R
dbinom(x = data_i$y, size = 1, prob = 1/(1 + exp(-logit_pred)), log = TRUE)
}

There might be some different algorithms used in R, so maybe you need to use some specific functions on R. Not an expert about R so maybe e.g. @bgoodri or @paul.buerkner can help.