Inversion of effect direction in multivariate count model

Hello, I am troubled by a strange, albeit not really significant, inversion of effect direction in a simple model. I am comparing 3 count quantities between years, two upper-bounded with a binomial distr., and another with a Poisson distr. This is the model:

       bf(Amb | trials(Slot_DCA) ~ (1 | Year)) + binomial(),
	   bf(nDCA | trials(Slot_nDCA) ~ (1 | Year)) + binomial(),
	   bf(Rep ~ (1 | Year)) + poison()
    ), data = data %>% filter(Period == Period[1]), # don't mind this
	chains = 0, cores = 8, file = fit_file, refresh = 0, iter = 10000,
	backend = 'cmdstan', prior = c(
	        prior(student_t(3, 0, 2), class = 'Intercept', resp = 'Amb'),
	        prior(student_t(3, 0, 2), class = 'Intercept', resp = 'nDCA'),
	        prior(student_t(3, 0, 1.5), class = 'Intercept', resp = 'Rep'),
	        prior(student_t(3, 0, 2.5), class = 'sd', coef = 'Intercept', group = 'Year', resp = 'Amb'),
	        prior(student_t(3, 0, 2.5), class = 'sd', coef = 'Intercept', group = 'Year', resp = 'nDCA'),
		prior(student_t(3, 0, 2.5), class = 'sd', coef = 'Intercept', group = 'Year', resp = 'Rep')
    ), adapt_delta = .9)

And this is the data:

For a particular period (showed above in the filter) I have that the posterior distr. of Amb in 2020 is lower than in 2019, even if the data is the opposite, that is 10 cases (Amb column) over 25 (Slot_DCA column) in 2019 and 10 over 17 in 2020:

The pattern repeats even refitting the model, so it’s not a random fluctuation.

Could it be the multivariate influence in the outcomes? You can notice a big difference between years for the nDCA quantity, which is strongly decreased in 2020. Could this strong effect drag also the Amb down in 2020?

1 Like

just to be clear the “posterior distributin” of the parameters is obtained via posterior_predict or add_predicted_draws ? Or is something else happening there?

What I think is going on is that posterior_predict takes into account the number of trials, so it (IMHO correctly) predicts roughly the same number of successes for both despite them 2019 having much more trials.

If you want to get the succes per trial, then you should predict with a modified dataset, where the Slot_DCA is set to 1 (or other suitable number, depending on your actual scientific question).

Does that make sense?

Also note that the way the model is written, you are not sharing any information between the three response, so you might as well fit them separately. But maybe this is just a simplified model for discussion here and the final model will have this additional structure?

Good luck with your model!

Hi, I’m using posterior_epreds for generating the results. As you correctly stated, the observation in 2020 has a smaller denominator than that in 2019. I would have expected the same average number of predicted successes though, just with different variability; instead I get a lower point estimate in the second case.

If as you say there’s no influence between model than my second hypothesis is the effect of the priors. I used zero centred priors on the logit scale, i.e. 50% in the probability scale. This means that the results in 2019 are pushed towards 25/2 = 12.5 and in 2020 towards 17/2 = 8.5, that is respectively upwards and downwards, as shown in the final results. Does this make sense?

PS: I thought that mvbf would create some sort of regularizing linkage between posteriors, isn’t that so? How I achieve that in brms? Anyway using one model instead of three it’s also faster to compute (one compilation step) and to manage 😊 .

1 Like

Yes, that’s a good observation I missed that most likely explains the remaining (small) difference. Further, you actually know the values with somewhat different precisions (as 25 trials give you more info than 17), plus you have the logit link, and the truncation by the different number of trials so, the posteriors for the expected value can have different variance and different skew, making the mean less useful quantity…

It can, but you need to explicitly build it, e.g. by enforcing correlations between varying effects as in

 bf(Amb | trials(Slot_DCA) ~ (1 | p | Year)) + binomial(),
	   bf(nDCA | trials(Slot_nDCA) ~ (1 | p | Year)) + binomial(),
	   bf(Rep ~ (1 | Year)) + poison()

would make the Year contributions to Amb and nDCA correlated while keeping the contribution to Rep separate. (p is an arbitrary string, the important part is that it is the same in both formulas).

1 Like