Hello,
I am trying to see whether it would be possible to specify stan model for the following data generating process directly using lognormal and Dirichlet - the random variable itself is lognormal distributed, but we only observe it gradually in yearly intervals. The proportions are assumed to follow a Dirichlet distribution, and sometimes we would have data points that are not final.
R code to generate the data:
alpha0 ← 0.5
alphas ← c(0.2, 0.5, 0.3) * alpha0meanlog ← 5
sdlog ← 3df ← list(
year = seq(2016, 2022, 1)
, obs_id = seq_len(20)
, period = seq_len(3)
) %>%
purrr::cross_df()size_df ← df %>%
dplyr::select(year, obs_id) %>% dplyr::distinct() %>%
dplyr::mutate(
size = rlnorm(dplyr::n(), meanlog, sdlog),
id = paste0(year, “_”, obs_id)
)df1 ← df %>%
dplyr::left_join(
size_df
) %>%
dplyr::arrange(year, obs_id)prop_patterns ← dplyr::bind_cols(
size_df %>% dplyr::select(id),
rdirichlet(nrow(size_df), alpha = alphas) %>%
tibble::as_tibble()
) %>%
tidyr::gather(key = ‘period’, value = ‘pct’, dplyr::starts_with(‘V’)) %>%
dplyr::mutate(
period = as.numeric(gsub(‘V’, “”, period))
)df2 ← df1 %>%
dplyr::left_join(
prop_patterns
) %>%
dplyr::group_by(id) %>%
dplyr::mutate(
y = size * cumsum(pct)
) %>%
dplyr::ungroup()df3 ← df2 %>%
dplyr::filter((year + period - 1) <= 2022) %>%
dplyr::select(year, period, id, y)
I appreciate any suggestions in advance.
Thanks,
Allen