ZINB model convergence issues when predicting shape parameter

  • Operating System: Windows 10, Intel Xeon core
  • brms Version: 2.10.0

Hello: this is a follow-up to my original post (Priors on predicted shape parameter(s) in mixed zinb model). There I received helpful feedback regarding the interpretation of the shape variance parameter in ZINB models. Here I’m looking for assistance in properly specifying the model.

I am attempting a zero-inflated negative binomial mixed-effect model, and I am also interested in predicting the shape (dispersion) parameter (so as to assess whether variance is different across years in my model). I realize variance parameters are tricky and can require a lot of data, but if there is a way, I would like to predict it.

I am predicting a count response: cars95, annual reproductive success
Predictors include:
Main effects:
age (discrete) - most central to my question
year (discrete) - most central to my question
mort (0/1) - sparse data, not my primary focus
bwev (0/1) - sparse data, not my primary focus
prev sire (0/1) - sparse data, not my primary focus
age x year interaction - most central to my question
Group-level effects:
age|id (where id is categorical for individual across years; this random intercept is necessary to account for multiple samples from the same individual across years)

Thus, the model I would like to fit is:

model {
cars95 ~ age*year + mort + bwev + prevsire + (age|id), shape ~ year

This model fails to converge with default priors:

                  prior     class      coef group resp  dpar nlpar bound
1                               b                                       
2                               b       age                             
3                               b  age:year                             
4                               b     bwev1                             
5                               b     mort1                             
6                               b prevsire1                             
7                               b      year                             
8  student_t(3, -2, 10) Intercept                                       
9   student_t(3, 0, 10) Intercept                      shape            
10 lkj_corr_cholesky(1)         L                                       
11                              L              id                       
12  student_t(3, 0, 10)        sd                                       
13  student_t(3, 0, 10)        sd                      shape            
14                             sd              id                       
15                             sd       age    id                       
16                             sd Intercept    id                       
17                             sd            year      shape            
18                             sd Intercept  year      shape            
19           beta(1, 1)        zi                                                                  

I’m thrown the warnings:

Warning messages:
1: There were 861 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 59139 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
3: Examine the pairs() plot to diagnose sampling problems
 
4: The largest R-hat is 4.13, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
5: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
6: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Note, however, if I do not predict shape, the model converges fine. There is a lot of uncertainty around the shape parameter when no priors are given, so here I tweaked the default prior on shape (from the default gamma(0.01, 0.01) to gamma(1,1)) though I admittedly did not follow a rigorous method for doing so):

Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = identity 
Formula: cars95 ~ age * year + mort + bwev + prevsire + (age | id) 
   Data: c95data (Number of observations: 220) 
Samples: 4 chains, each with iter = 15000; warmup = 5000; thin = 2;
         total post-warmup samples = 20000

Group-Level Effects: 
~id (Number of levels: 60) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          1.82      0.97     0.15     3.86 1.00     4798     9806
sd(age)                0.54      0.22     0.14     1.02 1.00     4365     6019
cor(Intercept,age)    -0.63      0.37    -0.96     0.52 1.00     6200     6898

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -4.04      1.77    -7.93    -0.98 1.00     7970    13616
age           1.44      0.61     0.33     2.72 1.00     7962    14286
year          0.10      0.23    -0.35     0.58 1.00    14321    16159
mort1         0.58      0.90    -1.21     2.36 1.00    16313    17632
bwev1        -0.97      0.89    -2.81     0.70 1.00    16957    16902
prevsire1     0.07      0.50    -0.91     1.05 1.00     8872    14221
age:year     -0.15      0.08    -0.30    -0.00 1.00     8792    14912

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     1.68      0.95     0.54     4.14 1.00     9684    14440
zi        0.12      0.09     0.01     0.34 1.00    14419    16585

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

I’ve conducted loo tests and a number of post predictive checks on the the converged model without a predicted shape. Many, but not all, check out. Problems that arise with the initial model are a higher p_loo count than the number of parameters in my model, and a somehwat poor-performing ppc_dens_overlay plot:

Loo output:

Computed from 20000 by 220 log-likelihood matrix

         Estimate   SE
elpd_loo   -172.6 18.5
p_loo        34.2  6.5
looic       345.1 37.1
------
Monte Carlo SE of elpd_loo is 0.4.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     197   89.5%   13        
 (0.5, 0.7]   (ok)        23   10.5%   121       
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

Post-predictive checks:
PPC - bars:


PPC - ecdf overlay:

PPC- stat, mean

PPC - stat, mean and SD

PPC - ppc_check()

My questions for this post:

  1. Might including more informative priors for a model where I predict the shape variance parameter help convergence?
  2. If so, what priors should I assign - particularly for shape-specific parameters (the b, sd, and Intercept parameters associated with dpar=shape)
  3. Do you have any other suggestions/thoughts to improve model convergence beyond improved prior specification?
  4. Would you agree with my assessment that the model NOT predicting shape (for which I’ve included the model output and post-predictive checks above) ‘fits’ decently well - even considering the high p_loo value relative to my number of parameters?

I’ve included a reproducible example together with a link to data:

library(dplyr)
library(brms)
library(ggeffects)
library(GGally)
library(rstantools)
library(rstanarm)
library(loo)
library(bayesplot)

c95data <- read.csv("ARS95KAc.csv")
c95data$id <- as.factor(c95data$id)
c95data$year <- as.integer(c95data$year) #year (2-8) as integer NOT categorical
c95data$mort <- as.factor(c95data$mort)
c95data$prevsire <- as.factor(c95data$prevsire)
c95data$bwev <- as.factor(c95data$bwev)

#MODELS
#NOT predicting shape
mod1 <- brms(cars95 ~ age*year + mort + bwev + prevsire + (1|id) , 
     data=c95data,
     family=zero_inflated_negbinomial(), 
     prior= set_prior("gamma(1,1)"),
     iter=20000, warmup=5000, thin=2, chains=4,
     control = list(adapt_delta = 0.99), 
     cores=6)


##Predicting shape -- DOES NOT CONVERGE
mod2_nopri <- brms(bf(cars95 ~ age*year + mort + bwev + prevsire + (1|id) , 
     shape ~ year), 
     data=c95data,
     family=zero_inflated_negbinomial(), 
     iter=20000, warmup=5000, thin=2, chains=4,
     control = list(adapt_delta = 0.99), 
     cores=6)

#LOO
loo1 <- loo(mod1, reloo=T, cores=6)

##PPCs
#post predictive checks
yrep <- posterior_predict(mod1)
y <- c95data$cars95
bayesplot::ppc_stat(y, yrep[1:50, ], stat="mean")
brms::pp_check(mod1)
brms::pp_check(mod1, type="stat_2d", stat=c("mean", "sd"), nsamples=30)
brms::pp_check(mod1, type="ecdf_overlay", discrete=T, nsamples=30)
brms::pp_check(mod1, type="bars", nsamples=30)

DATA: https://drive.google.com/file/d/1T51Cp0m8qHtsf2fZcpqchMQJo6dIdQn1/view?usp=sharing

Thank you in advance to everyone/anyone who gives my struggles and questions some time! It is so appreciated.
El