I have data on the time a system was active in each state (e.g.,
2). The aggregate time distribution is plotted in Fig A. My goal is to model the time distribution for each state and the weight of each component (Fig. B).
The data was generated as:
library('tidyverse') y0 <- rlnorm(100, -1, 0.5) s0 <- rep(0, 100) y1 <- rlnorm(500, 2, 0.5) s1 <- rep(1, 500) y2 <- rlnorm(250, 1, 0.5) s2 <- rep(2, 250) d <- tibble(y = c(y0, y1, y2), s = c(s0, s1, s2)) %>% mutate(s = as.factor(s))
I used a log-normal regression to estimate the parameters of each state duration distribution as:
m <- brm(y ~ 0 + s, data=d, family = lognormal())
that was successful to retrieve the real parameters (
s0 ~ -1,
Family: lognormal Links: mu = identity; sigma = identity Formula: y ~ 0 + s Data: d (Number of observations: 850) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS s0 -0.99 0.05 -1.09 -0.89 1.00 4400 3006 s1 1.98 0.02 1.94 2.02 1.00 4886 3177 s2 1.00 0.03 0.94 1.05 1.00 4142 2551 Family Specific Parameters (in Fig. C each component is normalized as a PDF): Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 0.49 0.01 0.46 0.51 1.00 4618 3164 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Now, I would like to estimate the mix of the three components. In Fig. D I weighted each component by the proportion
w of data for each state I have in the dataset.
w <- d %>% group_by(s) %>% summarise(count = n()) %>% mutate(tot = sum(count)) %>% mutate(w = count/tot) # s count tot w # <fct> <int> <int> <dbl> # 1 0 100 850 0.118 # 2 1 500 850 0.588 # 3 2 250 850 0.294
Is this a good way to get the mix weight? How could I instead estimate this parameter from the data I have? I was thinking I could use a binomial regression like:
brm(count | trials(tot) ~ 0 + s, data = w, family = binomial(), file = 'mix.rds')
but I am not sure this is the way to go. Any ideas?