Thank you very much for providing a reproducible example! The model summaries match, however, conditional effects plots differ. How can it be so?
I edited your example by adding a categorical X variable for plotting.
set.seed(2020)
### create data according to the hurdle-lognormal distribution
# probability of a zero
pi <- 0.3
# lognormal mean
mu_log <- 2
sigma_log <- 0.2
# generate data
N <- 1000
y <- (1 - rbinom(N, 1, prob = pi)) * rlnorm(N, mu_log, sigma_log)
x <- rep(letters[1:2], each = 1, times = 500)
### fit models
# m1 = hurdle_lognormal
d <- data.frame(y=y, x=x)
m1 <- brms::brm(
y ~ x, data = d,
family="hurdle_lognormal",
cores = 4
)
summary(m1)
# m2 = lognormal
d_g_0 <- d %>% filter(y > 0)
m2 <- brms::brm(
y ~ x, data = d_g_0,
family="lognormal",
cores = 4
)
summary(m2)
#let's plot
conditional_effects(m1)
conditional_effects(m2)
m1 - hurdle lognormal - smaller y-axis values
m2 - lognormal - higher y-axis values
I played with the arguments of conditional_effects() function, however, plots always differed.
Then I tried add_fitted_draws() from model posterior, which game similar result.
newdata = expand.grid(x = c("a", "b"))
posterior = add_fitted_draws(m1, newdata = newdata, re_formula = NULL, scale = "response", dpar = NULL, value = "y")
median_qi(posterior)
Then I tried to manually calculate y-variable estimates and these match the plot of lognormal() model
posterior2 = posterior_samples(m1) %>% mutate(a = exp((b_Intercept) + 1/2 * (sigma)**2), b = exp((b_Intercept + b_xb) + 1/2 * (sigma)**2)) %>% dplyr::select(a, b)
posterior_summary(posterior2) %>% as.data.frame()
Conclusions
- Though I am not sure about my manual calculations, their results match lognormal model conditional effects plot.
- In contrast, add_fitted_draws() and conditional_effects() plots give weird results for hurdle_lognormal model (m1).
- I also tried add_fitted_draws() and manual calculations on lognormal model (m2) and both of these matched the conditional effects plot of m2!
- I run the whole example with another continuous predictor and there was also the same discrepancy.
Why hurdle_lognormal model gives smaller y values? Or what am I doing wrong here?