R session crash for a multinomial model; Multiple Indexing in arrays

fitting-issues
performance

#1

Hi,
I am playing with an applied project that models the transition probability of a two-wave survey responses. The model is simple though the notation is a little bit complex. So I leave the model description in the attached pdf file.

It is a matrix-factorization type of model. Maybe because the data is too sparse (though the sample size is relatively small), or maybe because I have some stupid bugs in my stan code, but for onething it is extremely slow to fit it in rstan (considering the data size, and considering this is the simple version of the larger model that I want to fit):

Gradient evaluation took 0.045367 seconds

For another, I get the error for parallel chains

 > Error in unserialize(socklist[[n]]) : error reading from connection
 > In addition: Warning message:
 > In readLines(file, warn = TRUE) :
 > incomplete final line found on '/.....stan'

and occasionally followed with a R session crash, (which further prevents me from any debugging.) Basically it cannot return a non-empty stan-fit result.
It seems like a installation issue, but it isn’t, for I can run other stan codes without difficulty.

Platform info:

> sessionInfo()
>R version 3.4.2 (2017-09-28)
>Platform: x86_64-apple-darwin15.6.0 (64-bit)
>Running under: macOS High Sierra 10.13
>rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
>StanHeaders_2.16.0-1

And besides that, I am wondering how to make my Stan code more efficient, that is I have difficulty in vectorization in some situation. I still have some full loops in the code.
(1) [Multiple Indexing in arrays].
In line 112 of my stan code, I am trying to expressing some multinomial likelihood in a full loop,

for( k in 1:K)    for(n in 1:N)
    target +=  log_p[Y1[n,k], k, group[n],  Y2[n,k]  ]

while log_p is claimed as vector[I] log_p[I,K,L] for some other matrix operation purposes.
I tried to do some Multiple Indexing tricks like

target +=  sum ( log_p[Y1[1:N,k], k, group[1:N],  Y2[1:N,k]  ] )

But it doesn’t work.

(2)[array of matrix]
In line 73, I have

> for(l in 1:L)
>beta_mean[l]=mu_0+lambda_beta[l] + one_i_col * mu_beta+ eta_beta* one_k_row;

where beta is claimed as matrix[I,K] beta_mean[L];
But I was not able to do

>beta_mean[1:L]=mu_0+lambda_beta[1:L] + one_i_col * mu_beta+ eta_beta* one_k_row;

(3) normal array
In line 106 I have

   for(l in 1:L)
   for(i in 1:5)
        beta[l,i] ~ normal(beta_mean[l,i], sigma_beta );   

beta[l] is a matrix; beta_mean[l,i] is a vector, so it is already vectorized. Though in the ideal case I may want something like beta ~ normal(beta_mean, sigma_beta ); , that is matrix[]~normal ( matrix[], real).

Finally, as you can see in my code, I used non-centered version for most parameters (a~N(0,1)), I am not sure whether such practice is useful.

I have attached the model description, stan code, and data file. Run code below to execute rstan.

  load(file="standata.RData")
  fit_demo=stan(file="transition_multi.stan", data=list(N=1700, K=12, I=5, L=488,Y1=view[,,1], Y2=view[,,2], age=age_split_vec,T_age=max(age_split_vec), gender=gender_split_vec, pid=pid_split_vec, marr=marstat_split_vec,   educ=educ_split_vec, index_group=index_group),iter =10, chains=1 )

transition_multi.stan (5.2 KB)
model_description.pdf (80.8 KB)
RData file: https://www.dropbox.com/s/yiz4777k0uouqse/standata.RData?dl=0

Thanks,
Yuling


#2

target += sum ( log_p[Y1[1:N,k], k, group[1:N], Y2[1:N,k] ] )

I think the issue is when multiple of the index arguments are arrays, you get out high dimensional things. Hard to explain cleanly in a sentence, but like this (copied from the manual):

matrix[5,7] m;
...
row_vector[3] rv;
rv = m[4, 3:5]; // result is 1 x 3
...
vector[4] v;
v = m[2:5, 3]; // result is 3 x 1
...
matrix[3, 4] m2;
m2 = m[1:3, 2:5]; // result is 3 x 4

–

I think for now with things like:

for(l in 1:L)
   for(i in 1:5)
        beta[l,i] ~ normal(beta_mean[l,i], sigma_beta );   

You can do:

for(l in 1:L)
      to_vector(beta[l]) ~ normal(to_vector(beta_mean[l]), sigma_beta );

Maybe that’s a little less verbose at least?


#3

Thanks, it does help. (though I am still struggling with the error and R crash)
Updated Stan code:
transition_multi.stan (5.2 KB)


#4

Hi,
Some update:

  1. I found an embarrassing error in my code: I said I used non-centered parameterization, but in fact I didn’t. Now I have re-parameterized.

  2. R session crash/aborted seems to be R-studio thing. I run vanilla R (which I should have done earlier) and got some new error message before it crashed ( I only run 10 iteration and one chain):

[1] "Error: segfault from C stack overflow"                            "Warning message:"                                                
[3] "Lost warning messages"                                            "Error : C stack usage  140728099464424 is too close to the limit"
error occurred during calling the sampler; sampling not done

C stack overflow??? I can check my C stack in R:

> Cstack_info()
      size    current  direction eval_depth 
   7969177       4232          1          2 

I don’ believe my parameter space should be that large (140728099464424 bytes ???). The order of magnitude of my input data is 1700 * 12. The “largest” parameter in the parameter space is beta, having dimension 5*1700*488. From my understanding, it is not beyond stan’s ability / my laptop memory to handle.

PS. I searched “c stack overflow”+stan, and found someone posted similar error without a good solution for a related multinomial model 200 years ago (https://groups.google.com/forum/#!topic/stan-users/WcnfB1oGhEw). OK, I guess maybe multinomial itself is hard.

Yuling


#5

I think if you get a proper bug trace this comes down to R/Rcpp list implementation. Would be nice if I had time to run it down.


#6

Do you have a recursive function defined somewhere? This kind of error often shows up when you try to recurse more deeply and run out of stack (where local variables are stored—C doesn’t have efficient tail recursion like languages like Lisp).

Can you print in the Stan program as you go and see any output? That can help you figure out if if it’s crashing within or outside of Stan. Running in CmdStan would also help diagnose whether the problem is in RStan or in Stan itself.


#7

Hi Bob,

No, the model is simple. I don’t have any user-defined function there.

When I run rstan in rstudio, I get following error and cannot obtain any result:

>fit_demo=stan(file="transition_multi.stan", data=list(...) ,iter =10, chains=2 )
Iteration: 1 / 10 [ 10%]  (Warmup)
...
Iteration: 10 / 10 [100%]  (Sampling)

 Elapsed Time: 0.881468 seconds (Warm-up)
               0.643135 seconds (Sampling)
               1.5246 seconds (Total)

Error in unserialize(socklist[[n]]) : error reading from connection

Iteration: 2 / 10 [ 20%]  (Warmup)
...
Iteration: 10 / 10 [100%]  (Sampling)

 Elapsed Time: 9.94336 seconds (Warm-up)
               0.595152 seconds (Sampling)
               10.5385 seconds (Total)

By the way, I am able to run any other rstan code in parallel.

When I run one chain, it prints everything successfully:

...
Iteration: 10 / 10 [100%]  (Sampling)

 Elapsed Time: 0.71665 seconds (Warm-up)
               0.709265 seconds (Sampling)
               1.42592 seconds (Total)

but then crashes.

The good(?) news is there is no problem in cmdStan (though it is still slow so I have to optimize my code somewhere for I have some full loops ):

>./multi  sample num_samples=10  num_warmup=10  data file=multi.data.R
...
Gradient evaluation took 0.050591 seconds
1000 transitions using 10 leapfrog steps per transition would take 505.91 seconds.
Adjust your expectations accordingly!
WARNING: No variance estimation is
         performed for num_warmup < 20
Iteration:  1 / 20 [  5%]  (Warmup)
Iteration: 11 / 20 [ 55%]  (Sampling)
Iteration: 20 / 20 [100%]  (Sampling)

 Elapsed Time: 18.3331 seconds (Warm-up)
               2.9495 seconds (Sampling)
               21.2826 seconds (Total)

Something is indeed written in the output.csv.


#8

Hi,
Sorry for posting this topic again. The story is that I returned to this stan model after a long while. As expected, I got the same error. Fortunately, those errors disappear when I put some transformed parameters into the model block without any other changes. Another solution is to fit a subsample, shrinking the parameter numbers.

I am sure it is not (purely) a memory issue because I would get errors even when I run 10 iterations. From my understanding, the only difference between the variable defined in model block and those in transformed parameters is whether they get saved (all parameters are unconstrained. ) So I cannot see why such a change could make a difference. Is there a limit for the number of total number of parameters+ transformed parameters (number of variables in posterior draws) in stan or rstan?

The number of parameters is large, but probably not huge. The variable I changed from transformed block to model block is defined as,
model vector[5] log_p[5,12,180];
Maybe matrix of vectors is not an efficient form to claim? I defined it in this way for the benefit of matrix operation.


#9

If that’s the behavior, it’s almost certainly a memory issue in RStan.

Is this only an RStan error or does the same thing happen in CmdStan? The difference is that RStan allocates all the memory it needs for draws (and then I think takes some extra for copying), whereas CmdStan streams out the draws so they don’t take much memory. They both allocate the same memory dynamically for derivatives. So my guess is that you’re just blowing out storage. If you have a 5 x 5 x 12 x 180 data structure, that 55K entries at 8 bytes each, or 400K just to store one draw of this data structure. So 2 draws are 1MB, 2000 draws are 1GB, so 4 chains of 2000 draws saving warmup requires 4GB storage just for this one data structure.