I have data on the time a system was active in each state (e.g., 0
, 1
, and 2
). The aggregate time distribution is plotted in Fig A. My goal is to model the time distribution for each state and the weight of each component (Fig. B).
The data was generated as:
library('tidyverse')
y0 <- rlnorm(100, -1, 0.5)
s0 <- rep(0, 100)
y1 <- rlnorm(500, 2, 0.5)
s1 <- rep(1, 500)
y2 <- rlnorm(250, 1, 0.5)
s2 <- rep(2, 250)
d <- tibble(y = c(y0, y1, y2),
s = c(s0, s1, s2)) %>%
mutate(s = as.factor(s))
I used a log-normal regression to estimate the parameters of each state duration distribution as:
m <- brm(y ~ 0 + s,
data=d,
family = lognormal())
that was successful to retrieve the real parameters (s0
~ -1, s1
~2, s2
~1):
Family: lognormal
Links: mu = identity; sigma = identity
Formula: y ~ 0 + s
Data: d (Number of observations: 850)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
s0 -0.99 0.05 -1.09 -0.89 1.00 4400 3006
s1 1.98 0.02 1.94 2.02 1.00 4886 3177
s2 1.00 0.03 0.94 1.05 1.00 4142 2551
Family Specific Parameters (in Fig. C each component is normalized as a PDF):
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.49 0.01 0.46 0.51 1.00 4618 3164
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now, I would like to estimate the mix of the three components. In Fig. D I weighted each component by the proportion w
of data for each state I have in the dataset.
w <- d %>%
group_by(s) %>%
summarise(count = n()) %>%
mutate(tot = sum(count)) %>%
mutate(w = count/tot)
# s count tot w
# <fct> <int> <int> <dbl>
# 1 0 100 850 0.118
# 2 1 500 850 0.588
# 3 2 250 850 0.294
Is this a good way to get the mix weight? How could I instead estimate this parameter from the data I have? I was thinking I could use a binomial regression like:
brm(count | trials(tot) ~ 0 + s, data = w,
family = binomial(), file = 'mix.rds')
but I am not sure this is the way to go. Any ideas?