I am playing with an applied project that models the transition probability of a two-wave survey responses. The model is simple though the notation is a little bit complex. So I leave the model description in the attached pdf file.
It is a matrix-factorization type of model. Maybe because the data is too sparse (though the sample size is relatively small), or maybe because I have some stupid bugs in my stan code, but for onething it is extremely slow to fit it in rstan (considering the data size, and considering this is the simple version of the larger model that I want to fit):
Gradient evaluation took 0.045367 seconds
For another, I get the error for parallel chains
> Error in unserialize(socklist[[n]]) : error reading from connection
> In addition: Warning message:
> In readLines(file, warn = TRUE) :
> incomplete final line found on '/.....stan'
and occasionally followed with a R session crash, (which further prevents me from any debugging.) Basically it cannot return a non-empty stan-fit result.
It seems like a installation issue, but it isn’t, for I can run other stan codes without difficulty.
>R version 3.4.2 (2017-09-28)
>Platform: x86_64-apple-darwin15.6.0 (64-bit)
>Running under: macOS High Sierra 10.13
>rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
And besides that, I am wondering how to make my Stan code more efficient, that is I have difficulty in vectorization in some situation. I still have some full loops in the code.
(1) [Multiple Indexing in arrays].
In line 112 of my stan code, I am trying to expressing some multinomial likelihood in a full loop,
for( k in 1:K) for(n in 1:N)
target += log_p[Y1[n,k], k, group[n], Y2[n,k] ]
log_p is claimed as
vector[I] log_p[I,K,L] for some other matrix operation purposes.
I tried to do some Multiple Indexing tricks like
target += sum ( log_p[Y1[1:N,k], k, group[1:N], Y2[1:N,k] ] )
But it doesn’t work.
(2)[array of matrix]
In line 73, I have
> for(l in 1:L)
>beta_mean[l]=mu_0+lambda_beta[l] + one_i_col * mu_beta+ eta_beta* one_k_row;
where beta is claimed as
But I was not able to do
>beta_mean[1:L]=mu_0+lambda_beta[1:L] + one_i_col * mu_beta+ eta_beta* one_k_row;
(3) normal array
In line 106 I have
for(l in 1:L)
for(i in 1:5)
beta[l,i] ~ normal(beta_mean[l,i], sigma_beta );
beta[l] is a matrix; beta_mean[l,i] is a vector, so it is already vectorized. Though in the ideal case I may want something like
beta ~ normal(beta_mean, sigma_beta ); , that is matrix~normal ( matrix, real).
Finally, as you can see in my code, I used non-centered version for most parameters (a~N(0,1)), I am not sure whether such practice is useful.
I have attached the model description, stan code, and data file. Run code below to execute rstan.
fit_demo=stan(file="transition_multi.stan", data=list(N=1700, K=12, I=5, L=488,Y1=view[,,1], Y2=view[,,2], age=age_split_vec,T_age=max(age_split_vec), gender=gender_split_vec, pid=pid_split_vec, marr=marstat_split_vec, educ=educ_split_vec, index_group=index_group),iter =10, chains=1 )
transition_multi.stan (5.2 KB)
model_description.pdf (80.8 KB)
RData file: https://www.dropbox.com/s/yiz4777k0uouqse/standata.RData?dl=0