# Conditional effects and summary estimate showing opposite results

Hi everyone,

This is my first time posting in the forums, I apologize in advance if I made any formatting mistakes. I have my data and models stored in an .Rdata file, but it is 616mb, so I can’t upload it. If anyone tells me where I should upload it, I will do it. Here’s my code brms_forum.R (2.2 KB). EDIT: I am using R 4.1.0, brms 2.17.0, cmdstanr 0.4.0.9000

I am modeling participation in a citizen science project that studies disease vectors based on a set of covariates at the census tract level: average household income, average presence of the species (mosquito, predominant type of land cover (residential, green/leisure/other) , population density, weekend (dummy), month (dummy) and year (dummy).

In my analysis, I compare two methods of accounting for the spatial autocorrelation of my outcome variable (presence or absence of a participation–logit):

1. Multilevel modeling with random intercepts for each census tract.
``````prior <- get_prior(presence ~ poly(inc,2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC), data = D, family = bernoulli(link = "logit"), chains = n_chains, cores = n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

prior\$prior[1] <- "normal(0,5)"

m4 <- brm(presence ~ poly(inc,2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC), data = D, family = bernoulli(link = "logit"), chains = n_chains, cores = n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

``````
1. ICAR models with queen autocorrelation matrix among census tracts:
``````prior <- get_prior(presence ~ poly(inc, 2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC) + car(queens, gr=CUSEC, type="icar"), data = D, data2 = list(queens = queens), family = bernoulli(link = "logit"), chains=n_chains, cores=n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

prior\$prior[1] <- "normal(0,5)"

car2 = brm(presence ~ poly(inc, 2) + poly(bg_count, 2) + landcover + poly(popd, 2) + weekend + month + year + (1 | CUSEC) + car(queens, gr=CUSEC, type="icar"), data = D, data2 = list(queens = queens), family = bernoulli(link = "logit"), chains=n_chains, cores=n_chains, backend = "cmdstanr", threads = n_threads, iter = 8000)

``````

In my MLM model (m4), the effect of my main independent variable of interest (inc–average household income of the census tract), is negative (-10.80, std. 9.53) although 95% credible interval goes through zero.

However, when I calculate the predicted probability of my outcome variable based on this variable:

``````plot(conditional_effects(m4))
``````

The effect has an inverted-U shape.

Note that, for the rest of the variables, the conditional effects go in the same direction as the estimates provided by the summary function. Is this an error?

The second question is about my ICAR model.
How should I interpret the correlation structure coefficient? I was reading it as a similar concept to intraclass correlation, where the estimate tells you whether there are differences across your spatial units, but I am not entirely sure.

Álvaro.

Just jumping in to say that I’m having the same issue (summary indicating positive slope, conditional_effects indicating negative slope) and would be interested in people’s thought about it. Thanks for posting!

Ok, I’ve been inspecting the codes behind `conditional_effects` and I think I know what causes the mismatch both in my case and yours. Here’s my best guess:

Essentially, `conditional_effects` ignores group-level (random) effects by internally setting `re_formula = NA` in `posterior_epred`. The curve shown in the resulting plot is therefore based on posterior draws from your model while ignoring `(1|CUSEC)`, while your `summary` estimates account for it.

EDIT: Forgot to say that you can set `re_formula = NULL` in `conditional_effects` in order to account for group-level effects (although something goes wrong when I try to do this with my model — still looking into it).

Hi Jocateme,

I had indeed not read that part in the documentation, it is the first time I fit a multilevel model in brms.
That should be what is causing the difference, however, I cannot test it. When I do

`````` conditional_effects(m4, re_formula = NULL)
``````

I get an error saying Error: NAs are not allowed in grouping variables., even though not a single one of my observations has a missing matched group. I checked this by subsetting my data to a new data frame that omits any row containing an NA in my group variable–I get the same number of observations.

Yep, same thing here. What happens is that `conditional_effects` draws samples based on simulated data spanning your y-axis (`inc`) range, while assuming fixed values (mean) for all of the other variables and NA for group-level effects. When it tries to draw samples with `posterior_epred(..., re_formula = NULL)`, the function stops because of these NA.

What I believe you have to do is give `conditions` the group level you’re interested in assessing. You’d do that by running `conditional_effects(..., conditions = data.frame(CUSEC = [level(s) you want to look])`.

1 Like

I haven’t looked at the issue in your main question, but the error with `re_formula=NULL` looks like a bug in brms 2.17.0 that is fixed in the development version, which you can install with `remotes::install_github("paul-buerkner/brms")`.

1 Like

After installing the development version as @joels suggested

``````conditional_effects(m4, re_formula = NULL)
``````

Gives me wider credible interval “bands” but still displays the opposite effect as the summary estimate.

`poly(x, 2)` (where `x` is any predictor variable in your model) by default creates orthogonal polynomial terms in `x`. The coefficients for each polynomial order will be different than with raw polynomials and will no longer have the same direct interpretation you’re probably expecting (interpretational and mathematical details here, here, and here).

Change the code to `poly(x, 2, raw=TRUE)` to get raw polynomials (that is, not transformed to be linearly independent). This won’t change the predictions or credible intervals, but the resulting regression coefficients will have the direct interpretation you’re expecting (although since they will not be linearly independent, the linear and quadratic terms can’t be interpreted separately; together they give the predicted change in the log(odds) of `presence` per change in `inc` (but note also that this varies with the value of `inc`)).

2 Likes

Thank you for the references @joels, they are very helpful.

I initially used `poly() `to reduce the collinearity of my estimates, which also reduced the time it took to compute my models considerably–from 3-4h to 15 minutes–though I wasn’t totally aware of the differences in the interpretation between them.