Multi threading version of a simple linear mixed model: y ~ cond + (cond||subj)

bnicenboim · November 30, 2018, 6:00am

Hi,
I’m having some troubles trying to come up with map_rect version of a simple linear mixed model, such as y ~ cond + (cond||subj)

I used brms to build the Stan code, and then modified it according to how I understood map_rect should be implemented, and strangely enough, when I ran the multi threading version I get CIs that are relatively close to the brms’s values, but all the 95% CI are just above or below the true value of the parameters. And while the brms model has quite low n_eff (maybe @paul.buerkner wants to check what’s going on), the multi threading version has a much higher n_eff. (And I assume they should be the same, or am I wrong?)

This is the output of the brms model:

# Inference for Stan model: lmm_nc_brms.
# 3 chains, each with iter=2000; warmup=1000; thin=1; 
# post-warmup draws per chain=1000, total post-warmup draws=3000.

#             mean se_mean   sd 2.75% 97.5% n_eff Rhat
# b_Intercept 6.01       0 0.01  5.99  6.03   107 1.01
# b[1]        0.49       0 0.01  0.47  0.51   104 1.05
# sigma       0.10       0 0.00  0.10  0.10  4714 1.00
# sd_1[1]     0.10       0 0.01  0.08  0.11   285 1.00
# sd_1[2]     0.10       0 0.01  0.09  0.11   242 1.02

# Samples were drawn using NUTS(diag_e) at Fri Nov 30 06:44:33 2018.
# For each parameter, n_eff is a crude measure of effective sample size,
# and Rhat is the potential scale reduction factor on split chains (at 
# convergence, Rhat=1).

(The true values are 6 ,.5, .1,.1,.1)

This is the output of the multi-threading version:


# Inference for Stan model: lmm_nc.
# 3 chains, each with iter=2000; warmup=1000; thin=1; 
# post-warmup draws per chain=1000, total post-warmup draws=3000.

#             mean se_mean   sd 2.75% 97.5% n_eff Rhat
# b_Intercept 6.01       0 0.00  6.00  6.01  4775    1
# b[1]        0.49       0 0.00  0.49  0.49  4146    1
# sigma       0.17       0 0.00  0.16  0.17  4097    1
# sd_1[1]     0.03       0 0.02  0.00  0.09  1117    1
# sd_1[2]     0.03       0 0.02  0.00  0.09  1182    1

# Samples were drawn using NUTS(diag_e) at Fri Nov 30 06:52:32 2018.
# For each parameter, n_eff is a crude measure of effective sample size,
# and Rhat is the potential scale reduction factor on split chains (at 
# convergence, Rhat=1).

I’m attaching the R code that I used to generate the fake data, the brms model that I used as a basis, and my attempt to build a model that uses multi threading. I would really appreciate to get feedback so that I understand where it went wrong.

lmm_multi_threading.R (2.0 KB)
lmm_nc_brms.stan (1.8 KB)
lmm_nc.stan (3.5 KB)

Thanks!
Bruno

Guido_Biele · November 30, 2018, 10:11am

I don’t have an answer, just two questions:

Are shards of unequal size now possible?

How do you make sure that theta is matched correctly with the data?
I tried to figure this out, but gave up because it took too much time. More comments in the code would probably help here. Matching theta (z_1) and the data could be an issue, given that the map_rect model does not recover the correct sd_1 values. (The fact that sigma is to high even though population level effects are estimated correctly in the map_rect model also suggest that something is off with the estimation of random effects in the map_rect model)

PS: One would need to see the precise se_mean values to judge reliably if your two models estimate the “same” parameters. But given that the output is rounded to 2 decimals, these are probably below .01, so the estimates for sigma and sd_1 are probably not the same (within the mcmc error) in the two models. So I am reasonably sure that the map_rect model is not quiet right.

bnicenboim · November 30, 2018, 10:28am

I think one could manage to do it, by creating shards as large as the largest group, and with one of the x_i values indicating how many of the x_r are actual data (and ignoring the rest). In any case, in this example, all the shards have the same length.

I don’t know! I thought that Stan would take care of that. (That’s how it looks like in the example of the manual)
I’m just stating that
vector[M_1] theta[N_1]; in the parameters section where M_1 =2 (two random effects) and N_1is the number of shards (= number of subj)

And then theta becomes the standardize random effects inside the map_rect function:
vector[M_1] z_1 = theta;

Yes, I’m quite sure it’s not right.

jroon · November 30, 2018, 11:40am

Count me as another who has tried and failed to do this kind of thing.
But this might be helpful - it is the clearest explanation I’ve seen on how to implement map_rect (although I’m still failing to do it correctly myself): https://github.com/rmcelreath/cmdstan_map_rect_tutorial

bnicenboim · November 30, 2018, 12:07pm

But my problem seems to be with theta, and Richard is not using it. The weird thing is that my code runs, and it almost gives the right answer…

jroon · November 30, 2018, 12:30pm

Ah right sorry. I also write some code that seemed to work on a toy model, but when I ran it on real data was very, very wrong. I’ve resisted looking thru your code since I’ve not figured this out myself, I’m afraid I’ll only make it worse 😅

Guido_Biele · November 30, 2018, 1:17pm

Did you try debug by print?

If you use your reduce function (mt) without a call to map_rect:

for (s in 1:N_1) 
  mt(phi, theta[s], Y_X_r[s], n_i[s])

You can use print statements in mt to check if your data look as expected when in the reduce function.

jroon · November 30, 2018, 1:27pm

Ok I did look at your code. Again bear in mind I’m also struggling with map_rect but I think I see what swrong with your theta. So your theta is supposed to be the parameters per shard, and in your transformed data you are defining shards as groups of rows defined by pos and end. So for your theta - in the transformed parameters block, you need to pack up your shard specific parameters - i.e. z_1, as chunks also corresponding to the rows defined by matching pos and end values. But in your version - you are not defining z_1 in the parameters block at all - only in the function block. What you need to do is define the z_1 in the parameter block as in the brms version, chunk it in the transformed parameters block and stuff it into theta according to rows defined by pos and end, and then in your function unpack as you do the x_r.

So, anyhow that is one thing I see - may not be everything and I may be wrong (but I think I’m right). Hope that is helpful! Bear in mind,

jroon · November 30, 2018, 1:48pm

NB -> Above edited to actually make sense! 😃

bnicenboim · November 30, 2018, 2:38pm

But look at pages 242 and 243 of the new user guide. The random effect is there beta, it’s defined as a parameter vector[2] beta[K]; but then the likelihood is added only inside the map_rect function (end of page 241).
So I’m doing essentially the same, I’m calling it theta in the parameters section (with a vector M_1=2 rows, in an array of N_1 (= number of subjects size), and then I add the likelihood of it (and I call it there z_1 and z_2) inside the map_rect function.

jroon · November 30, 2018, 3:02pm

Oh sorry so. I don’t know in this case. Its all very confusing to me. But in your model I can’t see how z_1 and z_2 can be defined like that inside the function each time. If z_1 and z_2 only exist inside the function, what is being passed through theta each iteration ??? Theta does not seem to relate to other variables in the other block either.

bnicenboim · November 30, 2018, 8:27pm

I don’t see anything wrong, the data of each shard is being passed fine. Regarding the potential issues with the theta, I don’t really know what to look for.

bnicenboim · November 30, 2018, 8:37pm

ok, so my model was almost right!! It was just a stupid curly bracket!!
The return of the map_rect function was inside the for loop of the mu!!

Sorry to waste your time!! :/

jroon · November 30, 2018, 9:27pm

Wow so small thing. I’m surprised it didn’t cause any error.
Happy you sorted your problem. Not happy I apparently understand it less well than I thought :/

Topic		Replies	Views
Trouble getting Multi threading to work for a multi-variable linear model Modeling	3	539	February 7, 2019
Inconsistent posteriors when changing number of threads (map_rect) Modeling	14	1152	December 8, 2018
Linear, parallell regression CmdStan	89	5508	September 20, 2018
Map_rect concurrent about to land Developers math	34	2250	July 23, 2018
Map_rect tutorial performance comparison - small vs. big machine General performance	23	2078	March 2, 2019

Multi threading version of a simple linear mixed model: y ~ cond + (cond||subj)

Related topics