Multivariate Poisson model having mysterious NaNs, cryptic errors

I’m trying to determine whether one of three treatments has an effect on the catch rate of a variety of taxa. My data have a hierarchical structure, where 3 sites were sampled once a year for 5 years; each site contained a large number of minnow traps that were organized into groups of three (“blocks”) where each trap in the block had a different treatment. There were three sizes of trap (which were consistent within a block). The captured animals were categorized by taxa and counted.

I’m trying to build the following model:

Y[taxa = i, block = j, treatment = k] ~ Poisson_log(eta[i,j,k])

eta[i,j,k] = theta_taxa[taxa = i] + theta_site[taxa = i, site|block = j] + theta_site_year[taxa = i, site_year|block = j] + theta_block[taxa = i,block = j] + beta_treatment[taxa = i, treatment = k] + beta_size[taxa = i, size|block = j]

All of the theta and beta parameters have a Normal(0,sigma_*) prior except for theta_block. To account for the possible interactions among taxa in very close proximity, theta_block ~ multi_normal_cholesky(0, L), where L is decomposed into a standard deviation vector and cholesky correlation matrix as is usual.

I’ve included a copy of the full model file below, the area of concern is below:

// parameter declarations
matrix[N_blocks, N_taxa] raw_theta_block; // normal(0,1)
cholesky_factor_corr[N_taxa] block_corr; // LKJ prior
vector<lower=0>[N_taxa] sigma_block; // half-t
...
// transformed variables
theta_block = raw_theta_block * diag_pre_multiply(sigma_block,  block_corr);

When I run this, I repeatedly get errors like this:

"Exception: multiply: B[14] is -nan, but must not be nan!  (in 'model918734c7da6_main_model' at line 50)" 

If I change the definition of theta_block to

theta_block = diag_post_multiply(raw_theta_block, sigma_block) * block_corr;

which should be mathematically equivalent, this error goes away, but I still receive NaNs when trying to call the Poisson Likelihood later in the model.

In both cases, the model goes through the entire sampling procedure (usually taking a fair amount of time) before returning with an error and claiming that sampling was not done. I haven’t seen this particular behavior before and am rather confused by it.

Does anyone have any advice for how to fix the problem?

main_model.stan (4.2 KB)
stan_error.log.txt (9.7 KB)
stan_data.txt (214.0 KB)

rstan SessionInfo:

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18

locale:
 [1] LC_CTYPE=en_US.UTF-8      
 [2] LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                 
 [9] LC_ADDRESS=C              
[10] LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils    
[5] datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2         rstan_2.16.2        
 [3] StanHeaders_2.16.0-1 dplyr_0.7.0         
 [5] purrr_0.2.2          readr_1.0.0         
 [7] tidyr_0.6.0          tibble_1.3.3        
 [9] ggplot2_2.2.1        tidyverse_1.0.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11     bindr_0.1       
 [3] magrittr_1.5     munsell_0.4.3   
 [5] colorspace_1.2-7 R6_2.2.2        
 [7] rlang_0.1.1      plyr_1.8.4      
 [9] tools_3.3.2      grid_3.3.2      
[11] gtable_0.2.0     lazyeval_0.2.0  
[13] assertthat_0.2.0 gridExtra_2.2.1 
[15] codetools_0.2-15 inline_0.3.14   
[17] glue_1.1.1       scales_0.4.1    
[19] stats4_3.3.2    

The sampler is certainly not going well, but the actual error is unrelated to those NaNs:

Exception: poisson_log_rng: Log rate parameter is 21.0356, but must be less than 20.7944  (in 'model9186157f5ed_main_model' at line 116)"
error occurred during calling the sampler; sampling not done

This is a well-known problem when trying to draw from a Poisson distribution in the generated quantities block. Basically, you have to catch when the log rate parameter is too large and then return the largest integer or something rather than drawing from the Poisson.

But you should fix your model first. I always recommend first doing such models with stan_glmer in the rstanarm R package or with brm in the brms R package to verify that the model is feasible with your data before spending a lot of time writing a Stan program that does the same thing.