Please also provide the following information in addition to your question:
- Operating System: Win 10
- brms Version: 2.1
Hi there,
I’m fitting a multinomial model (based on prior helpful advice). We have a set number of measurements (157) on subjects, with each measurement placing the subject in one of 10 states. Right now I’m using aggregated count data, much like the multinomial example from Paul here. I may eventually model each timepoint with a subject-level effect, but I’m starting somewhat simpler.
I’ve attached a snippet of the data as a reference in case there is some important difference between it and the dummy data:
Let’s make some dummy data as per Paul’s post,but set up to have different reference levels:
N <- 15 dat <- data.frame( y1 = rbinom(N, 50, 0.9), y2 = rbinom(N, 150, 0.5), y3 = rbinom(N, 150, 0.2), x = rnorm(N) ) dat$size <- with(dat, y1 + y2 + y3) dat$ya <- with(dat, cbind(y1,y2, y3)) dat$yb <- with(dat, cbind(y2, y1, y3)) dat$yc <- with(dat, cbind(y3, y2, y1))
I noticed, when sampling from the prior, that my priors didn’t seem to impact the mean/median for the prior predictive samples the way I thought it would
I thought it was related to the fact that my priors weren’t applying to the reference category, and as such I set up the data to use different reference levels so I could fit difference PP models
prior <- prior(normal(0, 2), "Intercept") fit1 <- brm(bf(ya | trials(size) ~ 1), data = dat, family = multinomial(), prior = prior, sample_prior = "only") fit2 <- brm(bf(yb | trials(size) ~ 1), data = dat, family = multinomial(), prior = prior, sample_prior = "only") fit3 <- brm(bf(yc | trials(size) ~ 1), data = dat, family = multinomial(), prior = prior, sample_prior = "only")
Here are some results:
a<-dat %>% add_fitted_draws(fit1) %>% ggplot(aes(x = .value, y = .category)) + stat_pointintervalh(.width = c(.66, .95)) b<-dat %>% add_fitted_draws(fit2) %>% ggplot(aes(x = .value, y = .category)) + stat_pointintervalh(.width = c(.66, .95)) c<-dat %>% add_fitted_draws(fit3) %>% ggplot(aes(x = .value, y = .category)) + stat_pointintervalh(.width = c(.66, .95))
a/b/c+plot_annotation(tag_levels = ‘1’,tag_prefix = "fit ")
The reference level is at the bottom each time, and always has a lower median and a narrower distribution. In my mind it makes sense since the others are defined in relation to the reference, so they would have greater variance?
However, In my actual data the distribution for the reference is often narrower and with a lower median than I would predict a priori. For instance, I have 157 timepoints during which a subject will be assigned to one and only one state, so a priori I may expect a median of around 15-16, with values above 100 being less likely but certainly not impossible (someone could be in the same state all 157 times). But my priors for the test data I attached look like this (with the only prior being normal(0, 2) on “Intercept”):
That is not a reasonably prior on the reference category, and all of them are centered too low. I tried increasing the sd on the normal prior and decreasing it and while it changes the spread they are all still centered rather low.
Questions
How can I specify a set of priors that will center the prior predictive interval higher?
How can I increase the width of the distribution on the reference level, seeing as the current one is far too narrow?
Given the fact that the reference is so different, is it typical for investigators to run the model with various references and then average over models?
Thanks so much
Hugo
EDIT: forgot to attach test_subset.csv (396 Bytes)