In my fairly complex model (see here), I have parameters which are 2 x k matrices, and for which I need to provide initial conditions. Specifically:
parameters{
// lots of other parameters
real
matrix[2,k] mu1;
matrix[2,k] mu2;
}
model{
// lots of stuff
for(i in 1:k){
col(mu1, i) ~ multi_normal(mu1_mean, mu1_Sigma);
col(mu2, i) ~ multi_normal(mu2_mean, mu2_Sigma);
}
}
Using rstan
, I include them in the inits
as follows:
mu1.init <- cbind(x1s, y1s)
mu2.init <- cbind(x1s, y1s)
sampling(mymodel, inits = list(mu1 = mu1.init, mu2 = mu2.init, ... [other parameters]))
Which doesn’t seem to work … i.e., all the other initial values are fed into it, but for mu1 and mu2 I get the [-2,2] default sampling, which is very far from a good value.
Any ideas why or how to fix?
Finally … I have seen it noted that
vector[k] mu1[2];
vector[k] mu2[2];
is a “better” (slightly more efficient?) way to store and sample with these parameters. In that case, I guess the comparable sampling instruction be:
for(i in 1:k){
mu1[i] ~ multi_normal(mu1_mean, mu1_Sigma);
mu2[i] ~ multi_normal(mu2_mean, mu2_Sigma);
}
(Yes?)
But how would you provide the initial value for that? As a 2 x k matrix? A k x 2 matrix? A list with k 2-length vectors?
Separately, the values of these two matrices are drawn from hyper-parameters mu1_mean
and mu1_Sigma
via the bivariate normal above. The initial values for those parameters are fine (e.g. mu1_mean ~ [-200, 800]
). I’m curious why it is that it even initializes those values so far away from the given distribution? I mean - I understand that each parameter gets initialized uniquely, but why not draw those from the distribution that its supposed to be drawn from anyway?
Thanks,
Elie