Conditional effects and summary estimate showing opposite results

Hi everyone,

This is my first time posting in the forums, I apologize in advance if I made any formatting mistakes. I have my data and models stored in an .Rdata file, but it is 616mb, so I can’t upload it. If anyone tells me where I should upload it, I will do it. Here’s my code brms_forum.R (2.2 KB). EDIT: I am using R 4.1.0, brms 2.17.0, cmdstanr 0.4.0.9000

I am modeling participation in a citizen science project that studies disease vectors based on a set of covariates at the census tract level: average household income, average presence of the species (mosquito, predominant type of land cover (residential, green/leisure/other) , population density, weekend (dummy), month (dummy) and year (dummy).

In my analysis, I compare two methods of accounting for the spatial autocorrelation of my outcome variable (presence or absence of a participation–logit):

  1. Multilevel modeling with random intercepts for each census tract.
prior <- get_prior(presence ~ poly(inc,2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC), data = D, family = bernoulli(link = "logit"), chains = n_chains, cores = n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

prior$prior[1] <- "normal(0,5)"

m4 <- brm(presence ~ poly(inc,2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC), data = D, family = bernoulli(link = "logit"), chains = n_chains, cores = n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

  1. ICAR models with queen autocorrelation matrix among census tracts:
prior <- get_prior(presence ~ poly(inc, 2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC) + car(queens, gr=CUSEC, type="icar"), data = D, data2 = list(queens = queens), family = bernoulli(link = "logit"), chains=n_chains, cores=n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

prior$prior[1] <- "normal(0,5)"

car2 = brm(presence ~ poly(inc, 2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC) + car(queens, gr=CUSEC, type="icar"), data = D, data2 = list(queens = queens), family = bernoulli(link = "logit"), chains=n_chains, cores=n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

In my MLM model (m4), the effect of my main independent variable of interest (inc–average household income of the census tract), is negative (-10.80, std. 9.53) although 95% credible interval goes through zero.

However, when I calculate the predicted probability of my outcome variable based on this variable:

plot(conditional_effects(m4))

The effect has an inverted-U shape.

Note that, for the rest of the variables, the conditional effects go in the same direction as the estimates provided by the summary function. Is this an error?

The second question is about my ICAR model.
How should I interpret the correlation structure coefficient? I was reading it as a similar concept to intraclass correlation, where the estimate tells you whether there are differences across your spatial units, but I am not entirely sure.

Thank you for your time,
Álvaro.

Just jumping in to say that I’m having the same issue (summary indicating positive slope, conditional_effects indicating negative slope) and would be interested in people’s thought about it. Thanks for posting!

Ok, I’ve been inspecting the codes behind conditional_effects and I think I know what causes the mismatch both in my case and yours. Here’s my best guess:

Essentially, conditional_effects ignores group-level (random) effects by internally setting re_formula = NA in posterior_epred. The curve shown in the resulting plot is therefore based on posterior draws from your model while ignoring (1|CUSEC), while your summary estimates account for it.

EDIT: Forgot to say that you can set re_formula = NULL in conditional_effects in order to account for group-level effects (although something goes wrong when I try to do this with my model — still looking into it).

Hi Jocateme,

I had indeed not read that part in the documentation, it is the first time I fit a multilevel model in brms.
That should be what is causing the difference, however, I cannot test it. When I do

 conditional_effects(m4, re_formula = NULL)

I get an error saying Error: NAs are not allowed in grouping variables., even though not a single one of my observations has a missing matched group. I checked this by subsetting my data to a new data frame that omits any row containing an NA in my group variable–I get the same number of observations.

Yep, same thing here. What happens is that conditional_effects draws samples based on simulated data spanning your y-axis (inc) range, while assuming fixed values (mean) for all of the other variables and NA for group-level effects. When it tries to draw samples with posterior_epred(..., re_formula = NULL), the function stops because of these NA.

What I believe you have to do is give conditions the group level you’re interested in assessing. You’d do that by running conditional_effects(..., conditions = data.frame(CUSEC = [level(s) you want to look]).

1 Like

I haven’t looked at the issue in your main question, but the error with re_formula=NULL looks like a bug in brms 2.17.0 that is fixed in the development version, which you can install with remotes::install_github("paul-buerkner/brms").

1 Like

After installing the development version as @joels suggested

conditional_effects(m4, re_formula = NULL)

Gives me wider credible interval “bands” but still displays the opposite effect as the summary estimate.

poly(x, 2) (where x is any predictor variable in your model) by default creates orthogonal polynomial terms in x. The coefficients for each polynomial order will be different than with raw polynomials and will no longer have the same direct interpretation you’re probably expecting (interpretational and mathematical details here, here, and here).

Change the code to poly(x, 2, raw=TRUE) to get raw polynomials (that is, not transformed to be linearly independent). This won’t change the predictions or credible intervals, but the resulting regression coefficients will have the direct interpretation you’re expecting (although since they will not be linearly independent, the linear and quadratic terms can’t be interpreted separately; together they give the predicted change in the log(odds) of presence per change in inc (but note also that this varies with the value of inc)).

2 Likes

Thank you for the references @joels, they are very helpful.

I initially used poly() to reduce the collinearity of my estimates, which also reduced the time it took to compute my models considerably–from 3-4h to 15 minutes–though I wasn’t totally aware of the differences in the interpretation between them.