Marginal_smooth in a binomial model


#1

Please also provide the following information in addition to your question:

  • Operating System: windows 8.1
  • brms Version:2.7.0

Dear community, I am struggling with getting a good understanding of the model output and its calculation from brms.
I built a model to understand butterfly mortality through time (week), region, and investigate differences among species. Here is my model (convergence was fine):

summary(mod21)
Family: binomial
Links: mu = logit
Formula: ind_dead | trials(total_ind) ~ 0 + region_butterfly +year + s(week, by = butterfly, k = 4)

I am now interested in the smooth part of the model and would like to plot s(week, by = butterfly, k = 4), all other predictors, year and region_butterfly, conditionned on their mean (and not on the reference category, as they are categorical variables). I first extract the table of predicted values:

smooth_week<-marginal_smooths(mod21,robust=FALSE,method=“predict”,conditions = data.frame(year = NA,reg_butt=NA))[[1]]

the plot of this table is the following:

But to get a better feeling of these variations, I would like the estimates to be backtransformed to my initial scale. As the model is binomial, the estimates and confidence intervals returned from the marginal_smooth are on a logit scale, which I therefore transform (hoping that so far my understanding is correct):

smooth_week$mortality=inv_logit_scaled(smooth_week$estimate__)
smooth_week$inv_low=inv_logit_scaled(smooth_week$lower__)
smooth_week$inv_upp=inv_logit_scaled(smooth_week$upper__)

I now get the following plot:

here are my specific questions:

  1. when extracting the marginal_smooth, is the argument “conditions” necessary to get estimate on the average mean of the other predictors? I assumed this function works as the marginal_effect function. In my specific case I also wonder if I should specify region_butterfly=NA because I would like to get mortality per week and butterfly averaged on region (and not on region_butterfly).

  2. I am sceptical that the transformation (inv_logit_scaled) did what I initially intented to get, which is the average mortality. For example, I know from my data that the average mortality of A. levana is on average below 10%. While in the first plot my understanding is that what is plotted is the variations in mortality for each species relative to the mortality of that species at the average week (around 4-5) on a logit scale, I cannot make sence of what I plotted in the second figure.

  3. is my only option is to extract the posterior from the 4 chains and average the values following the categories I intend to plot?


#2
  1. and 2.: To see the smooth on the response scale, I would suggest to use marginal_effects and the conditions argument to condition on the desired values of the other variables. Just transforming the smooth part will likely yield strange results, for instance, because you did not include the intercept or because 0 is not a reasonable value for your other predictors.

  2. What would be your preferred option?


#3

What I exactly would wish to do is to plot for each butterfly variation in the prediction on mortality according to week. I tried to extract the smooth term from the marginal effect:

marginal_effects(mod21,“week:butterfly”,conditions = data.frame(year = NA,region_butterfly=NA),robust=FALSE)

but it returned the following message:

Using the median number of trials by default if not specified otherwise. Error in tcrossprod(b, X) : non-conformable arguments
Error: Something went wrong (see the error message above). Perhaps you transformed numeric variables to factors or vice versa within the model formula? If yes, please convert your variables beforehand. Or did you set a predictor variable to NA?

I explain this error by the fact that I set the region_butterfly to NA while I also have “butterfly” in the term I would like to estimate. However, when building the model, I concatenated the 2 predictor variables region and butterfly because I have a missing contrast region*butterfly.


#4

Yeah, the fact that you want predictions per butterfly and have butterfly in a separate variable as well (region_butterfly) is unforunate for predictions as marginal_effects can’t take “butterly” in “region_butterfly” into account. The error message is indeed the result of setting your predictors to NA.

An easy option would be to have only main effects of region and butterfly, but I guess that’s not accurate with respect your subject matter knowledge, right?


#5

no, that is not what I would like indeed.

Therefore, would that be an option (point 3) to extract the posterior draws from the model and pool them together according to butterfly and week. I mean that I would calculate the mean of the posterior draw for each species and week, and this for the estimates, the lower and the upper limits and then backtransform these values with inv_logit_scaled. I am a bit behind the mathematics of how brms process the data… but that would sound correct or this is not the way to proceed?


#6

I don’t think this would work as you propose to be honest.

Alternatively, you could specify region:butterfly in your model and set priors on the coefficients so that they are identified even though the corresponding contrasts don’t actually appear in the data. You just had to make sure that illustrative predictions in marginal_effects are based on a region, which has no such missing contrasts so that these missing contrasts won’t affect the plot.


#7

ok I will think about it. Thanks for you time anyway!