I am trying to write models in which
(1) parameters are indexed arrays, such as theta[age, sex, region]. In my particular problem there are A=90 ages, S=2 sexes, R=402 regions.
(2) there is prior information about likely parameter differences across different dimensions, such as
(a) theta[a,] - theta[a-1,] ~ N(0,sigma_age) for a=2…A
(b) theta[,1,] - theta[,2,] ~ N(0,sigma_sex)
(c) theta[,reg1] - theta[,reg2] ~ N(0,sigma_reg) for many (reg1,reg2) pairs
(d) etc.
Stan often frustratates my attempts at efficiency because operators like “-” and “+” are undefined for arrays. For example,
parameters {
real theta[A,2,R];
}transformed parameters {
real age_diff[A-1,2,R];
real sex_diff[A,R];for (a in 2:A) {
age_diff[a-1] = theta[a] - theta[a-1];
}sex_diff = theta[1:A, 1, 1:R] - theta[1:A, 2, 1:R];
}
is not allowed because “-” can’t be used for elementwise subtraction on arrays.
Similarly, it seems that distributional statements like
age_diff ~ normal(0, sigma_age);
sex_diff ~ normal(0, sigma_sex);
are not allowed because the left hand side cannot be an array. I end up converting with
to_array_1d(age_diff) ~ normal(0, sigma_age);
to_array_1d(sex_diff) ~ normal(0, sigma_sex);
but I’m concerned that the recasting via to_array_1d is slowing the program. Recasting is also necessary, as far as I can tell, for many intermediate calculations involving arrays and sub-arrays that I’ve omitted here.
Are there any standard work-arounds for these sorts of problems with multidimensional arrays? Does recasting to and from multidim arrays really slow things down? Are there especially efficient ways to write the distributional statements?
Thanks.