Yes, that’s correct.
By all means, you should slice the theta and not the data y!
I did implement for a model based on categorical logit the respective reduce_sum thing and it did run crazy fast with it (like 4 cores = 4x as fast).
Yes, that’s correct.
By all means, you should slice the theta and not the data y!
I did implement for a model based on categorical logit the respective reduce_sum thing and it did run crazy fast with it (like 4 cores = 4x as fast).