Interpreting summary of hurdle_lognormal model

Dear list,

Below is the summary of a hurdle_lognormal model fit with brms:

Family: hurdle_lognormal 
  Links: mu = identity; sigma = identity; hu = logit 
Formula: ifelse(CutoffRep2 == 1, 0, CutoffRep2) ~ 1 + EarlyLate + (1 | Subj) 
         hu ~ 1 + EarlyLate + (1 | Subj)
   Data: NQ19 (Number of observations: 436) 
Samples: 4 chains, each with iter = 3000; warmup = 1500; thin = 1;
         total post-warmup samples = 6000

Group-Level Effects: 
~Subj (Number of levels: 155) 
                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)        0.30      0.07     0.17     0.43       2130 1.00
sd(hu_Intercept)     0.36      0.27     0.01     1.00       1782 1.00

Population-Level Effects: 
                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept            4.83      0.06     4.70     4.95       6257 1.00
hu_Intercept        -1.99      0.20    -2.43    -1.64       4828 1.00
EarlyLatelate        0.59      0.09     0.40     0.77       6991 1.00
hu_EarlyLatelate    -1.14      0.43    -2.04    -0.34       5986 1.00

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     0.84      0.03     0.78     0.91       4224 1.00

This makes good sense, but given the number of zeros in my data (43 out of 436 obs) I have difficulties interpreting the hurdle part of this model. The population-level effects are indeed in logit units, as per the specified link function. But what are the units of sd(hu_Intercept): am I correct in assuming that these are in proportions (ranging between 0 and 1 across Subj, and presumably also not log-transformed either) and that sd(hu_Intercept) is NOT in logit units?

Thanks in advance for your help and advice! Hugo Quené

  • Operating System: Mac OSX 10.14.4
  • brms Version: 2.8.0

Hi! :)

First of all, if I’m seeing this correctly, then the parameters from the hurdle part are denoted by the hu_ prefix. The population level effects withe this hu_ prefix are indeed on the logit scale. The other population level effects (without prefix) are actually on the log scale (as per the log-normal).

The group level effects give in the output are the standard deviations of the subject-specific intercepts on the scale of the linear predictor.

So the sd(hu_Intercept) is the standard deviation (these are always positive) of the subject-specific intercepts in the hurdle part. These intercepts are on the logit scale, so the standard deviation is taken of intercepts on the logit scale… Which is kinda hard to interpret intuitively.

Likewise, the other group level effect, sd(Intercept) is the standard deviation of the subject-specific intercepts in the log-normal part of the model. Thus these intercepts are on the log scale and the standard deviation is taken for these subject-specific intercepts on the log scale.

To get some intuition, you can quickly plot these on the original scale in R (this is by no means a correct or sufficient analysis, but it helps to build intuition):

hist(plogis(-1.99 + rnorm(155, 0, 0.01))) # lower CI
hist(plogis(-1.99 + rnorm(155, 0, 0.36))) 
hist(plogis(-1.99 + rnorm(155, 0, 1.00))) # upper CI

or use some crazy values like like hist(plogis(-1.99 + rnorm(100, 0, 10)).

You can also do this for the log-normal part, of course:

hist(exp(4.83 + rnorm(155, 0, 0.17))) # lower CI
hist(exp(4.83 + rnorm(155, 0, 0.30)))
hist(exp(4.83 + rnorm(155, 0, 0.43))) # upper CI

Hope this helps.

Dear Max_Mantei,
Many thanks for your reply, which was quite helpful indeed. Using plogis to plot the distribution did help to see what’s going on in the hurdle part. Thanks again, Hugo