Multi threading version of a simple linear mixed model: y ~ cond + (cond||subj)

Hi,
I’m having some troubles trying to come up with map_rect version of a simple linear mixed model, such as y ~ cond + (cond||subj)

I used brms to build the Stan code, and then modified it according to how I understood map_rect should be implemented, and strangely enough, when I ran the multi threading version I get CIs that are relatively close to the brms’s values, but all the 95% CI are just above or below the true value of the parameters. And while the brms model has quite low n_eff (maybe @paul.buerkner wants to check what’s going on), the multi threading version has a much higher n_eff. (And I assume they should be the same, or am I wrong?)

This is the output of the brms model:

# Inference for Stan model: lmm_nc_brms.
# 3 chains, each with iter=2000; warmup=1000; thin=1; 
# post-warmup draws per chain=1000, total post-warmup draws=3000.

#             mean se_mean   sd 2.75% 97.5% n_eff Rhat
# b_Intercept 6.01       0 0.01  5.99  6.03   107 1.01
# b[1]        0.49       0 0.01  0.47  0.51   104 1.05
# sigma       0.10       0 0.00  0.10  0.10  4714 1.00
# sd_1[1]     0.10       0 0.01  0.08  0.11   285 1.00
# sd_1[2]     0.10       0 0.01  0.09  0.11   242 1.02

# Samples were drawn using NUTS(diag_e) at Fri Nov 30 06:44:33 2018.
# For each parameter, n_eff is a crude measure of effective sample size,
# and Rhat is the potential scale reduction factor on split chains (at 
# convergence, Rhat=1).

(The true values are 6 ,.5, .1,.1,.1)

This is the output of the multi-threading version:


# Inference for Stan model: lmm_nc.
# 3 chains, each with iter=2000; warmup=1000; thin=1; 
# post-warmup draws per chain=1000, total post-warmup draws=3000.

#             mean se_mean   sd 2.75% 97.5% n_eff Rhat
# b_Intercept 6.01       0 0.00  6.00  6.01  4775    1
# b[1]        0.49       0 0.00  0.49  0.49  4146    1
# sigma       0.17       0 0.00  0.16  0.17  4097    1
# sd_1[1]     0.03       0 0.02  0.00  0.09  1117    1
# sd_1[2]     0.03       0 0.02  0.00  0.09  1182    1

# Samples were drawn using NUTS(diag_e) at Fri Nov 30 06:52:32 2018.
# For each parameter, n_eff is a crude measure of effective sample size,
# and Rhat is the potential scale reduction factor on split chains (at 
# convergence, Rhat=1).

I’m attaching the R code that I used to generate the fake data, the brms model that I used as a basis, and my attempt to build a model that uses multi threading. I would really appreciate to get feedback so that I understand where it went wrong.

lmm_multi_threading.R (2.0 KB)
lmm_nc_brms.stan (1.8 KB)
lmm_nc.stan (3.5 KB)

Thanks!
Bruno

1 Like

I don’t have an answer, just two questions:

Are shards of unequal size now possible?

How do you make sure that theta is matched correctly with the data?
I tried to figure this out, but gave up because it took too much time. More comments in the code would probably help here. Matching theta (z_1) and the data could be an issue, given that the map_rect model does not recover the correct sd_1 values. (The fact that sigma is to high even though population level effects are estimated correctly in the map_rect model also suggest that something is off with the estimation of random effects in the map_rect model)

PS: One would need to see the precise se_mean values to judge reliably if your two models estimate the “same” parameters. But given that the output is rounded to 2 decimals, these are probably below .01, so the estimates for sigma and sd_1 are probably not the same (within the mcmc error) in the two models. So I am reasonably sure that the map_rect model is not quiet right.

I think one could manage to do it, by creating shards as large as the largest group, and with one of the x_i values indicating how many of the x_r are actual data (and ignoring the rest). In any case, in this example, all the shards have the same length.

I don’t know! I thought that Stan would take care of that. (That’s how it looks like in the example of the manual)
I’m just stating that
vector[M_1] theta[N_1]; in the parameters section where M_1 =2 (two random effects) and N_1is the number of shards (= number of subj)

And then theta becomes the standardize random effects inside the map_rect function:
vector[M_1] z_1 = theta;

Yes, I’m quite sure it’s not right.

Count me as another who has tried and failed to do this kind of thing.
But this might be helpful - it is the clearest explanation I’ve seen on how to implement map_rect (although I’m still failing to do it correctly myself): https://github.com/rmcelreath/cmdstan_map_rect_tutorial

But my problem seems to be with theta, and Richard is not using it. The weird thing is that my code runs, and it almost gives the right answer…

Ah right sorry. I also write some code that seemed to work on a toy model, but when I ran it on real data was very, very wrong. I’ve resisted looking thru your code since I’ve not figured this out myself, I’m afraid I’ll only make it worse 😅

Did you try debug by print?

If you use your reduce function (mt) without a call to map_rect:

for (s in 1:N_1) 
  mt(phi, theta[s], Y_X_r[s], n_i[s])

You can use print statements in mt to check if your data look as expected when in the reduce function.

Ok I did look at your code. Again bear in mind I’m also struggling with map_rect but I think I see what swrong with your theta. So your theta is supposed to be the parameters per shard, and in your transformed data you are defining shards as groups of rows defined by pos and end. So for your theta - in the transformed parameters block, you need to pack up your shard specific parameters - i.e. z_1, as chunks also corresponding to the rows defined by matching pos and end values. But in your version - you are not defining z_1 in the parameters block at all - only in the function block. What you need to do is define the z_1 in the parameter block as in the brms version, chunk it in the transformed parameters block and stuff it into theta according to rows defined by pos and end, and then in your function unpack as you do the x_r.

So, anyhow that is one thing I see - may not be everything and I may be wrong (but I think I’m right). Hope that is helpful! Bear in mind,

NB -> Above edited to actually make sense! 😃

But look at pages 242 and 243 of the new user guide. The random effect is there beta, it’s defined as a parameter vector[2] beta[K]; but then the likelihood is added only inside the map_rect function (end of page 241).
So I’m doing essentially the same, I’m calling it theta in the parameters section (with a vector M_1=2 rows, in an array of N_1 (= number of subjects size), and then I add the likelihood of it (and I call it there z_1 and z_2) inside the map_rect function.

Oh sorry so. I don’t know in this case. Its all very confusing to me. But in your model I can’t see how z_1 and z_2 can be defined like that inside the function each time. If z_1 and z_2 only exist inside the function, what is being passed through theta each iteration ??? Theta does not seem to relate to other variables in the other block either.

I don’t see anything wrong, the data of each shard is being passed fine. Regarding the potential issues with the theta, I don’t really know what to look for.

ok, so my model was almost right!! It was just a stupid curly bracket!!
The return of the map_rect function was inside the for loop of the mu!!

Sorry to waste your time!! :/

1 Like

Wow so small thing. I’m surprised it didn’t cause any error.
Happy you sorted your problem. Not happy I apparently understand it less well than I thought :/