Hi all,

In a post of mine a few weeks ago, others suggested I utilize `reduce_sum`

via cmdstanr to enable scaling of an IRT model I had built. I was able to implement `reduce_sum`

and the model does compile with multithreading support and run properly. However, it is now vastly slower (contrary to the goal!). After a little debugging and playing with the model, I discovered that the ‘grainsize’ was at fault and was able to recover the model, but only back to speeds equivalent to my original model which had not used `reduce_sum`

.

Initially, I had set created ‘grainsize’ within the data block and passed ‘grainsize = 1’ as part of my data list for the model. The choice of 1 was to enable the automatic calibration of the grainsize suggested in documentation. However, I do not believe this is happening given the massive increase in computational time as well as the reduction from setting a more reasonable grainsize of roughly 400.

My question is, what would cause the automatic calibration to break/otherwise not trigger (for lack of a better word), and do others have suggestions for how I might find a reasonable grainsize on my own to work around this issue? Also, is it more likely that this would be an error on my end in how I have constructed my implementation of `reduce_sum`

?

Edit 1: My problems may also stem from what I believe is a non-vectorized function used in the likelihood calculations of my model. The stan code was generated using `brms::make_stancode()`

. The function and likelihood bits are as follows:

```
functions {
/* cumulative-logit log-PDF for a single response
* Args:
* y: response category
* mu: latent mean parameter
* disc: discrimination parameter
* thres: ordinal thresholds
* Returns:
* a scalar to be added to the log posterior
*/
real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
int nthres = num_elements(thres);
real p;
if (y == 1) {
p = inv_logit(disc * (thres[1] - mu));
} else if (y == nthres + 1) {
p = 1 - inv_logit(disc * (thres[nthres] - mu));
} else {
p = inv_logit(disc * (thres[y] - mu)) -
inv_logit(disc * (thres[y - 1] - mu));
}
return log(p);
}
}
model {
for (n in 1:N) {
target += cumulative_logit_lpmf(Y[n] | mu[n], disc[n], Intercept);
}
}
```

Is it possible that my issues with ‘grainsize’ are actually a result of the non-vectorize nature of the bulk of my model’s calculations?