Hello,
I am working with a GLMM for count data using a negative-binomial distribution. Observed counts were collected at several sites, with multiple observations occurring on the same date; I am using a crossed random effect structure to model this nonindependence. There is a “true count” associated with each observed count, and the ratio between the two is what I am indirectly modelling. There are some variables associated with each observation that are expected to affect the observed count and overdispersion in the counts. My model has the following simplified brms structure:
bf(true_count ~ offset(observed_count_log) + var1 + var2 + (1|site) + (1|date), shape ~ var3)
The goal of this model is to produce posterior predictive distributions for the true count at a new site, given several observed counts from this site. I have multiple counts from several different dates for this site. Neither the site nor the new dates of observation were included in the original dataset. In this case, I know that the true_count will be identical across all of these observations (this was not true in the training dataset, where true_count varied within each site at times), and I want to combine knowledge from all of these new observations to produce a precise estimate for the single true_count value.
Right now, I am generating predictive draws for each new observation as follows:
predicted_draws(brm_model, df_newdata, allow_new_levels = TRUE,
sample_new_levels = "uncertainty")
This generates a posterior distribution for every new observation, and I want to combine these into a single distribution. Right now my thought is to average the paired draws for each observation conducted on the same date, as I do not expect these observations to be independent from one another. My plan was then to generate probability density functions for these averaged posterior distributions on each date. Since observations on different dates should be independent from one another, I was then planning to multiply each of the averaged PDFs to create a single posterior predictive distribution for the true count at this site. Is this a valid approach, or can these observations from different dates not be considered to be independent? If I can’t multiply the marginal PDFs, how can I combine information across all of these observations for predictive purposes?