This is a fairly general question, but unfortunately I cannot share the data or results.
I have a large (~190,000 observations) longitudinal dataset with a hierarchical structure. The outcome is disease (yes/no). The hierarchical structure (not exactly true but will suffice for example) is state/zipcode/site. I have 16 time point measurements per site.
I have run the following model in ‘brms’:
model <- brm(outcome ~ 1 + time + (time|state/zipcode/site) + (1|grouping_factor2) +
(1|grouping_factor3), data=data, prior=prior, family=bernoulli())
I used normal(0,1) priors on the sd for group level effects. The model runs fine. Posterior predictive checks look good. Results look as expected.
The population level effect for “time” is negative on the log-odds scale. This is expected as it is expected disease goes down over time.
When I run marginal_effects(model), the plot shows a trend in proportion going down over time.
So far, so good.
Next, I predicted new data for every level in the model for timepoints over the next 4 years. Then I created colored maps for each year so that I could see the effect of the trend over time for each site by seeing the change in color over time (change in proportion = change in gradient color). I did this by aggregating the predictions by zipcode so that I had the mean proportion of disease for each zipcode.
While the trend (on the maps) by zipcode varies as expected, overall the disease proportion trends up! This is clearly visible. It is also evident by simply taking the mean proportion for the predicted for each year.
What am I missing??? It looks to me like while some zipcodes might go up, the overall trend in the predictions should match the overall trend in the population level results and the marginal_effects() plot. Why would there be this mismatch?
As I understand it, my model has an overall effect for “time” and then offsets from this overall effect for the varying effects at each level. So for example, if you wanted to find the effect for time for a particular site you would add all of the slopes: time effect + time effect for state + time effect for zipcode + time effect for site. This is correct, yes?
Any help is appreciated. Thanks!