Hurdle lognormal variance

Solomon · February 20, 2024, 8:08pm

Given the parameters \mu, \sigma, and hu, how does one compute the variance of the hurdle lognormal distribution? Based on this part of the brms Github, I know we can compute the mean for the hurdle lognormal as

\exp(\mu + \sigma^2 / 2) \cdot (1 - hu)

Here’s how that works in action.

library(tidyverse)

# define population parameters
mu <- 1
sigma <- 1
hu <- 0.25

# number of sample draws
n <- 1e6

# simulate
set.seed(1)

tibble(y = c(rep(0, times = n * hu),
             rlnorm(n = n * (1 - hu), meanlog = mu, sdlog = sigma))) %>% 
  # summarize
  summarise(sample_mean = mean(y),
            mean_by_formula = exp(mu + sigma^2 / 2) * (1 - hu))

# A tibble: 1 × 2
  sample_mean mean_by_formula
        <dbl>           <dbl>
1        3.36            3.36

But again, how does one compute the variance of the hurdle lognormal distribution?

jsocolar · February 20, 2024, 8:29pm

Think of the hurdle lognormal as a mixture of a lognormal and a distribution whose pdf is a delta function at zero. For the derivation of the variance of a mixture of one-dimensional distributions, see Mixture distribution - Wikipedia under the Moments heading.

Solomon · February 20, 2024, 8:33pm

Sadly, that looks to be over my head.

jsocolar · February 20, 2024, 8:37pm

I see why it looks that way, but I’ve seen you around enough to feel pretty confident that it’s not :)

Focus on the final line here, which is an expression for the variance of the mixture:

Solomon · February 20, 2024, 8:56pm

From Geebo Samuel on the bird site (link), we learn the variance for the hurdle lognormal is

(\exp(\sigma^2) - (1 - hu)) \cdot (1 - hu) \cdot \exp(2 \mu + \sigma^2)

Here’s what that looks like in code.

library(tidyverse)

# define population parameters
mu <- 1
sigma <- 1
hu <- 0.25

# number of sample draws
n <- 1e6

# simulate
set.seed(1)

tibble(y = c(rep(0, times = n * hu),
             rlnorm(n = n * (1 - hu), meanlog = mu, sdlog = sigma))) %>% 
  # summarize
  summarise(sample_var = var(y),
            var_by_formula = (exp(sigma^2) - (1 - hu)) * (1 - hu) * exp(2 * mu + sigma^2))

# A tibble: 1 × 2
  sample_var var_by_formula
       <dbl>          <dbl>
1       29.4           29.7

Solomon · February 20, 2024, 9:45pm

For a nice scholarly reference, see Smith et al (2014; https://doi.org/10.1002/sim.6263).

Topic		Replies	Views
Combining a hurdle_lognormal with lognormal in a mixture model brms specification , mixture , brms	5	296	June 16, 2024
Brms: does the lognormal part of the hurdle_lognormal() regression include zeros into analysis? Modeling	8	2371	September 27, 2020
Interpreting summary of hurdle_lognormal model brms interpret-results	2	1909	April 24, 2019
Mean model for "hurdle" proportion brms	1	424	July 26, 2019
Estimates from hurdle_lognormal() hurdle and positive components are mirror image brms	3	979	June 17, 2020

Hurdle lognormal variance

Related topics