Model results and marginal_effects do not seem to match predictions

This is a fairly general question, but unfortunately I cannot share the data or results.

I have a large (~190,000 observations) longitudinal dataset with a hierarchical structure. The outcome is disease (yes/no). The hierarchical structure (not exactly true but will suffice for example) is state/zipcode/site. I have 16 time point measurements per site.

I have run the following model in ‘brms’:

model <- brm(outcome ~ 1 + time + (time|state/zipcode/site) + (1|grouping_factor2) +
               (1|grouping_factor3), data=data, prior=prior, family=bernoulli())

I used normal(0,1) priors on the sd for group level effects. The model runs fine. Posterior predictive checks look good. Results look as expected.

The population level effect for “time” is negative on the log-odds scale. This is expected as it is expected disease goes down over time.
When I run marginal_effects(model), the plot shows a trend in proportion going down over time.

So far, so good.

Next, I predicted new data for every level in the model for timepoints over the next 4 years. Then I created colored maps for each year so that I could see the effect of the trend over time for each site by seeing the change in color over time (change in proportion = change in gradient color). I did this by aggregating the predictions by zipcode so that I had the mean proportion of disease for each zipcode.

While the trend (on the maps) by zipcode varies as expected, overall the disease proportion trends up! This is clearly visible. It is also evident by simply taking the mean proportion for the predicted for each year.

What am I missing??? It looks to me like while some zipcodes might go up, the overall trend in the predictions should match the overall trend in the population level results and the marginal_effects() plot. Why would there be this mismatch?

As I understand it, my model has an overall effect for “time” and then offsets from this overall effect for the varying effects at each level. So for example, if you wanted to find the effect for time for a particular site you would add all of the slopes: time effect + time effect for state + time effect for zipcode + time effect for site. This is correct, yes?

Any help is appreciated. Thanks!

Hi jd_c,
Thanks for posting this!

The varying slopes models are quite complex interaction models. I would trust the posterior inferences you generated yourself. The function marginal_effects() for time is probably looking at time only at the reference levels of your state/zipcode/site and other grouping factors. For those reference levels, yes, the trend maybe downwards - but for others it could be the other way round since you’ve explicitly specified the varying slopes.

Dilsher

Thanks for the reply, Dilsher.

I did more digging into this problem. I ran the same model in lme4.

Model summary results are almost identical between the lme4 and brms model (not surprising; even though I used regularizing priors in the brms model, the amount of data is very large).

I extracted each level of the varying slopes (aka random slopes) for both lme4 and brms models. I added these (slope at each level) to find the exact slope for each site (at the smallest level). In other words, to find the slope at a particular site you do: slope of state + slope of zipcode + slope of site. In brms, ~55% of sites have a negative varying slope. In lme4, ~53% of sites have a negative varying slope. When I add the global slope (aka fixed effect for time) to all of the varying slopes to get the actual slope for each individual level: ~81% are negative slopes in brms; in lme4, ~82.0% levels have negative slopes. The distribution and values for these slopes are very similar for the two packages.

In summary, the results are basically the same from the brms and the lme4 models. The results are also what I would expect and make sense.

Also, from the above information, I would expect that if you created a series of maps over time, then ~81-82% of sites would show a decrease in proportion of disease over time, and ~18-19% would show an increase. This too would make sense.

Where brms and lme4 differ drastically occurs when using the predict() function to try to create these maps.

I created a dataset that had a row for every combination of the levels in the varying effects.

I predict the response for each level from the models in both lme4 and brms. I include all varying effects (random effects) in the predictions (this is the default in both brms and lme4).

I predicted responses for time=25 and time=55. Taking the difference gives me an idea on the direction of the trend (or I could make two maps and look at the change in color).

What I find is that for lme4 predictions, ~82% of sites trend down over time based on the difference in predictions (this matches almost exactly what I found above by adding the varying slopes). For the brms predictions, ~10% of sites trend down over time based on the difference in predictions (it’s almost like the inverse of what is expected…).

I am not sure what is going on with the brms predictions…they seem backwards. Everything matches in the two packages except the predictions…

Solution (I think):
I decided to use the brms model to make predictions manually to see what was going on, so I extracted all the posterior samples for the global effects and for the varying slopes and intercepts. For my example, I chose a site where lme4 and brms had similar model results (global and varying effects) but very different trends in predictions over time. I then used the model formula and added the global effects to their offsets (varying effects) to get a posterior distribution for the prediction at that site for two different values of the time variable (time=25 and time=55). I then ran plogis() to convert from the log-odds to the probability. When I took the means of the probability distributions, I arrived at the same values as when using the predict() function in brms. When I took the medians of the distributions, I arrived at very similar values as the lme4 model. Creating a histogram of the distributions at time=25 and time=55 revealed what was going on. At time=25 the posterior distribution was more or less Gaussian, with mean and median pretty similar and similar to the lme4 prediction. However, at time=55 the distribution resembled something like negative binomial with most values around zero and a long right tail out to 1. Compared to time=25, at time=55, the mean of the distribution increased (due to the long right tail) while the median decreased (due to most values being near zero). This lead to a discrepancy in the direction of the effect between lme4 and brms. If I use the predict() function with robust=TRUE to predict the median in brms, then the results are very similar to those of lme4.
In summary, even though the model results (global and varying effects) were very very similar between lme4 and brms models, the majority of the predictions appeared to trend the opposite way, due to the difference between a point estimate and taking the mean of a posterior distribution.
To try to replicate the result from lme4 from the posterior samples of the brms model, I took the extracted posterior samples from the brms model and took the mean of each effect and then added those means (adding point estimates of each effect) using the model formula. I got the same result as from lme4. So, summing the posterior distributions and then taking the mean (as one should) does not equal taking the means of each effect and then summing the means.
So, it appears that given the same model results, lme4 and brms don’t necessarily provide the same predictions due to the difference between obtaining point estimates for effects and posterior distributions.

Responding to my original post - I think what I have described above in the difference between lme4 and brms predictions, is also the reason why the output from marginal_effects() and the trend in predictions (when predicting the means) don’t seem to match. As described in a previous comment, around 80% of sites have negative slope. However, then you predict() the mean for sites over time, most of the means go up due to the long right tails in the distributions for the predicted probability at each site. Most of the sites’ medians goes down.