Zero-inflated negative binomial model with random-effects

Rodrigo · May 23, 2019, 12:27am

Operating System: macOS Mojave
brms Version: 2.8.0

Hello,

I am just starting using Stan and brms and appreciate if someone could help me with some issues. Basically, I have an overdispersed count dataset and would like to fit a zero-inflated negative binomial model with random-effects and many covariates as follows:

m1 <- brm(bf(CASES_IND_total ~offset(FOLLOW_UP_IND) + LOCALIZACAO + BORRIFACAO_14_15_16 + DISTRIBUICAO_MOSQUITEIROS + USO_MOSQUITEIRO_NOITE_ANTERIOR + NR_MOSQUITEIROS_CASA + IDADE + SEXO + HORARIO_DORMIR + HORAIO_ACORDAR + NR_MORADORES + PRESENCA_PAREDE + PRESENCA_FORRO + LOCAL_BANHO + FREQUENTA_SITIO_FORA_CIDADE + WEALTH_INDEX + (1|DOM)),
                            data=CENSO_complete, 
                            family = zero_inflated_negbinomial(),
                            prior = set_prior("lasso(1)"),
                            chains = 1,
                            cores = getOption("mc.cores",3L))

summary(m1)

 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = identity 
Formula: CASES_IND_total ~ offset(FOLLOW_UP_IND) + LOCALIZACAO + BORRIFACAO_14_15_16 + DISTRIBUICAO_MOSQUITEIROS + USO_MOSQUITEIRO_NOITE_ANTERIOR + NR_MOSQUITEIROS_CASA + IDADE + SEXO + HORARIO_DORMIR + HORAIO_ACORDAR + NR_MORADORES + PRESENCA_PAREDE + PRESENCA_FORRO + PRESENCA_.JANELAS_FECHAM + PRESENCA_.TELA_.JANELAS + PRESENCA_TELA_PORTAS + LOCAL_BANHO + FREQUENTA_SITIO_FORA_CIDADE + WEALTH_INDEX + (1 | DOM) 
   Data: CENSO_complete (Number of observations: 8431) 
Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
     total post-warmup samples = 1000

Group-Level Effects: 
~DOM (Number of levels: 2221) 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.67      0.03     0.61     0.74        336 1.00

Population-Level Effects: 
                            Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept                          -3.26      0.39    -3.96    -2.43        600 1.00
LOCALIZACAO1                        0.45      0.05     0.34     0.55        894 1.00
BORRIFACAO_14_15_161                0.19      0.08     0.02     0.36        799 1.00
BORRIFACAO_14_15_162               -0.02      0.06    -0.14     0.09        912 1.00
DISTRIBUICAO_MOSQUITEIROS1          0.12      0.06     0.00     0.24       1035 1.00
DISTRIBUICAO_MOSQUITEIROS2         -0.06      0.06    -0.19     0.06        958 1.00
USO_MOSQUITEIRO_NOITE_ANTERIOR1     0.07      0.05    -0.02     0.17       1334 1.00
NR_MOSQUITEIROS_CASA1               0.01      0.05    -0.09     0.11        959 1.00
IDADE2                              0.23      0.08     0.07     0.39        805 1.00
IDADE3                              0.43      0.08     0.27     0.58        665 1.00
IDADE4                              0.33      0.09     0.14     0.51        769 1.00
IDADE5                             -0.20      0.11    -0.43     0.02        949 1.00
SEXO1                              -0.23      0.04    -0.31    -0.16       1756 1.00
HORARIO_DORMIR2                     0.02      0.04    -0.08     0.10       1255 1.00
HORARIO_DORMIR3                    -0.04      0.06    -0.16     0.08       1677 1.00
HORAIO_ACORDAR2                     0.01      0.06    -0.11     0.12       1212 1.00
HORAIO_ACORDAR3                     0.00      0.06    -0.13     0.13       1078 1.00
NR_MORADORES2                       0.10      0.05    -0.00     0.20       1149 1.00
PRESENCA_PAREDE1                   -0.43      0.34    -1.16     0.08        545 1.00
PRESENCA_FORRO1                    -0.07      0.06    -0.20     0.05       1098 1.00
PRESENCA_.JANELAS_FECHAM1          -0.04      0.17    -0.41     0.29       1249 1.00
PRESENCA_.TELA_.JANELAS1           -0.03      0.17    -0.38     0.31       1019 1.00
PRESENCA_TELA_PORTAS1               0.04      0.21    -0.38     0.55        878 1.00
LOCAL_BANHO1                       -0.15      0.06    -0.26    -0.03        929 1.00
FREQUENTA_SITIO_FORA_CIDADE1        0.05      0.05    -0.04     0.14       1179 1.00
WEALTH_INDEX2                      -0.02      0.05    -0.12     0.08       1125 1.00
WEALTH_INDEX3                      -0.03      0.07    -0.17     0.10       1117 1.00

Family Specific Parameters: 
  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
shape     1.44      0.28     1.02     2.04        456 1.00
zi        0.12      0.05     0.03     0.22        489 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

I am using LASSO to performs both variable selection and regularisation because I could not find a function to perform variable selection, such as Bayesian Model Averaging.

Question 1: How do people usually perform variable selection with brms?

After, I would like to check the distribution of random-effects. I assume that the random-effects are normally distributed with mean 0 and standard deviation given by the following distribution:

plot(m1, pars = "^sd_")

Rplot

Question 2: How to access the distribution of random effects? Is it possible to shrink the random-effects predictors toward their overall mean in brm? (Example: parameter lambda in [https://rdrr.io/cran/gamlss/man/random.html])

Next, I checked the ppc_loo_pit_qq plot and had, I assume, I very bad result:

m1.loo <- loo(m1, save_psis = TRUE, cores = 3)
psis1  <- m1.loo$psis_object
lw     <- weights(psis1)

y     <- CENSO_complete[,"CASES_IND_total"]
yrep  <- posterior_predict(m1)

ppc_loo_pit_qq(y, yrep, lw = lw, compare = "uniform")

Rplot01

Here I had the warning:

Warning message:
Found 44 observations with a pareto_k > 0.7 in model 'm1'. With this many problematic observations, it may be more appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.

On the other hand, by checking the graphical posterior predictive (ppc_rootogram), I had a reasonable result, I assume:

ppc_rootogram(y, yrep[1:100,],prob = .95)

Rplot02

Question 3: Could someone help me with the interpretation of both graphs? It seems to me that they lead to different conclusions. Does brms performs Worm plots?

Next, I tried to performed a 10-fold cross validation as suggested on the above warning. I had the following error:

(kfold1 <- kfold(m1,K = 10))
Fitting model 1 out of 10
Fitting model 2 out of 10
Fitting model 3 out of 10
Fitting model 4 out of 10
Fitting model 5 out of 10
Fitting model 6 out of 10
Fitting model 7 out of 10
Fitting model 8 out of 10
Fitting model 9 out of 10
Fitting model 10 out of 10
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling

 10-fold cross-validation

Error in data.frame(Estimate = x$elpd_kfold, SE = x$se_elpd_kfold, row.names = "elpd_kfold") : 
  'row.names' should specify one of the variables

Question 4: Any ideia what is wrong here?

I was using another package (gamlss) where I could satisfactorily fit the same zero-inflated negative binomial with random effects to my dataset. However, the package does not support LASSO. Worm and QQ plots bellow:

Thanks in advance!
PS: My dataset has 8431 observations.

paul.buerkner · May 24, 2019, 3:36am

Q1: The LASSO prior provides not good enough shrinkage and I would generally recommend not using it. Instead, the horseshoe prior usually has better shrinkage behavior. Our recommended procedure for variable selection would be to use the horseshoe prior and then perform projective predictions via the projpred package. However, the latter is not yet available for such complex models such as yours but we are currently working on generalizing projpred.

Q2: ranef or coef

Q3: Because of the 44 observations with khat > .7 I don’t think the LOO-PIT plot can be trusted. Perhaps @jonah has more insides on that.

Q4: Could you provide a minimal reproducible example for me?

Rodrigo · May 25, 2019, 3:54am

Thank you for your reply, Paul! Horseshoe prior seems to be better.

I am sharing the data here:

https://drive.google.com/drive/folders/1_1MEN0h-UN6uYpMNU8kE0LzKSitWX8sb?usp=sharing

One more question: I am checking the random-effects distribution and it seems to be not normal. My guess this is also contributing for a bad diagnostic plot. Any thoughts about it?

Rplot03

paul.buerkner · May 27, 2019, 7:27pm

The random effects distribution fulfills the purpose of a prior (it is a prior in fact) and it is often useful and reasonable to assume a normal distribution to apply shrinkage even if the true distribution might look somewhat different. I remember @richard_mcelreath having some interesting thoughts about this in his book “statistical rethinking”.

paul.buerkner · May 27, 2019, 7:51pm

Thanks also for providing a reproducible example!

Unfortunately, I cannot reproduce the error you see. Do you have the latest release versio of brms 2.9.0 or higher? One alternative explanation is that you had loaded rstanarm additionally to brms in the same session, which may cause problems with kfold (that are hopefully fixed soonish).

Topic		Replies	Views
Mathematical Notation for a zero inflated negative binomial model in brms Modeling math	13	1997	March 26, 2021
Model specification of a zero-inflated negative binomial random effect model using brms brms specification	4	854	July 2, 2021
Zero inflated beta binomial model under brms brms	1	1616	February 19, 2020
Zero_inflated_poisson with brms Modeling rstan , fitting-issues , brms	4	1601	June 14, 2021
Mathematical notation for a zero-inflated negative binomial nested model Modeling math , brms	0	342	January 24, 2024

Zero-inflated negative binomial model with random-effects

Related topics