Hello,
I need to compare the frequency of an event between two years. Those who experienced this event can themselves be divided in 3 categories, which also I want to compare between years.
My idea idea was to model this with the following (simplified) model:
Cases[1,2,3] ~ Binomial(Tot_cases, P_cases[1,2,3])
Tot_cases ~ Binomial(Tot_pop, P_tot_cases)
Cases[1] + Cases[2] + Cases[3] = Tot_cases
Cases[1,2,3] = alpha_cases + beta_cases*Year
Tot_cases = alpha_tot_cases + beta_tot_cases*Year
(omitting the priors for simplicity)
That is, costrain the sum of the cases count for the three categories to sum up to the total number of events which itself has uncertainty. Only Tot_pop is certain.
How would I convert this into a brms formula?
Thanks
1 Like
I think you would be well-served by taking a step back and thinking about the model a bit more.
If your Cases
variables are counts from mutually exclusive categories, a multinomial distribution would be a more appropriate model. Regarding Tot_cases
, this is a discrete latent (i.e. not observed) variable, and you can’t fit the model as is in Stan. You would need to marginalise this variable out of the model, either in closed form or by brute force using log_sum_exp
. I suggest you describe your data, problem and the questions you want to answer. This forum has many experienced modellers that can then help you achieve your goal.
1 Like
Yes, you are definitely right, I was trying to avoid using it by constraining the sum, but it was silly.
I wanted to avoid categorical() due to the posterior_predict output being individual predictions while I preferred group counts for compatibility with the rest of my pipeline. Then I discovered the multinomial() family which provide counts outputs, but I’m having some Stan problem with it (same as Multinomial family errors with cmdstanr backend · Issue #1033 · paul-buerkner/brms · GitHub). Anyway, these are solvable technical problems.
My idea was that in each MCMC iteration a Tot_cases value could be produced starting from a P_tot_cases proposal. Then the Cases likelihood be conditional on Tot_cases.
To go into details, the model was trying to do this:
In a child psychiatric clinic, you have a fixed number of inpatient visit slots for a certain disease group (eating disorders). First I wanted to evaluate whether the number of visits in a given period increased in 2020 compared to 2021. To be precise, the actual model uses a random intercept on the year, not a fixed effect, but I wanted to simplify since that’s not the point of my question.
Then the second question is: among the cases, did the relative distribution of sub-diagnoses changed? Or more precisely, did all the diagnoses increase equally or just some of them?
I could do this running two separate models, a binomial for the cases conditional on the visit slots (which I did) and a multinomial for the subdiagnoses conditional on the observed cases.
But I felt that doing so would be too optimistic for the second model, because I wouldn’t include the uncertainty of the first model into the first. Maybe I’m overthinking it.
I hope is clearer now.