Grainsize for hierarchical model

Hello, I am using reduce_sum and within-chain parallelization for my hierarchical model. I have a few questions.

The guide suggests:

For instance, in a model with N=10000 and M = 4, start with grainsize = 25000, and sequentially try grainsize = 12500, grainsize = 6250, etc.

  1. My data is in the long format with I*T rows. Do I treat I*T as N?

  2. Now, suppose I am slicing the data over people, I, in my notation. Do I still use I*T as N or should I use I?

EDIT: I have a follow-up question.

I ran my model without parallelization and got it to converge. I then ran it using parallelization with a grainsize = 1. The results are the same. Then I ran again with a different grainsize. Surprisingly, there are quite a bit more divergent iterations. But the magnitudes of the coefficients stay the same. Can grainsize affect that? I use “start” and “end” quite extensively in the indexing in my code.


Yes, in the sense that’s the amount of data you have. Usually it’s much more efficient to put data in wide form for Stan because then it’s easier to vectorize by I or T.

You want to break up so as to keep the people together. I’m afraid I’m not sure how it deals with arrays, but presumably it doesn’t matter what the elements are, so you’d use I.

That shouldn’t happen if you start with the same random seed. The trajectories should be the same (up to small differences because floating point arithmetic isn’t associative). If you have an example where this happens with the same seed, it’d be great if you could share—it may be a bug somewhere.

Hello. Thanks for replying. I am afraid I cannot share the exact data or code because it is something I develop for work. But I will examine more to see what (if any) is wrong.

This is the 2nd time I try to use reduce_sum. For the first time, everything is in the long-format and it successfully reduced the computational time.

I read that it could have even greater reduction if I slice over people, not just rows. But it is quite tedious to get the indexes right with long format data. I am considering re-coding everything using wide format data.

But if I put the data in the wide format, wouldn’t that imply I need to loop over I or T? I have always followed this example and put the data in long format. Can you please provide more insight?