Logistic Regression with Interactions (ContinuousXFactors(>2Levels))

ReneTwo · February 22, 2019, 6:42pm

Hi there,

a general question (has coding and statistical aspects):

I have a model that looks like this:
m<-brm(DV~0+IV1IV2+IV3+IV1:IV2:IV3+(1+IV2IV3| subject), family=bernoulli(),…)

So its a logistic regression with:
IV1 - Factor (3 between subject conditions)
IV2 - Factor (2 within subject conditions)
IV3 - Continuous (centered; within subjects)
(currently dummy coded; IV2 and IV3 are fully crossed, and present in each condition of IV1.)

As you see this goes beyond the ‘simple’ main effect formulations which are usually presented here, and this comes with some problems in the Bayesian logistic regression framework I am struggling with.

What I am especially interested in are the slope (IV3) differences between the levels of IV1, by each level of IV2. For instance: on IV2(Level1)-> is the slope steeper in IV1(Level1) than in IV1(Level2); or statistically speaking: is their difference 0? or: are the posteriors of the slope-differences in the region of practical equivalence? or: what is the evidence (BF) for this?

Now the above model formula is based on what Paul once suggested somewhere else, and serves as a neat start, since it provides indeed all the six separate estimates for:
IV1_1:IV2_1:Slope
IV1_2:IV2_1:Slope
IV1_3:IV2_1:Slope
IV1_1:IV2_2:Slope
IV1_2:IV2_2:Slope
IV1_3:IV2_2:Slope

This is nice and convenient, however, this is a logistic regression… and the problem is, that there are no really meaningful priors, since the posterior of each of these Slope estimates depends on ‘what goes into the logit’ from the other model terms. Given that there is an effect of the slope(IV3), then if a lot variance is already explained by IV1 alone, then even large changes of the slope on the log-scale will hardly change the log-odds (or p(1|Model)). But if none of the other factors contribute to the prediction, then small changes on the slope might alter the prediction a lot. This then of course also depend on the type of data one got (e.g. lots of 1’s or lots of 0’s), and makes “a priori” prior scale considerations somewhat futile. Thus, depending on the “variance” in the binary data and other IV’s, the prior range of the IV3 estimates might to broader or narrower, respectively.
Although this might be a question I could ask, this serves as a mere example, of why I do not want to interpret the estimation results with Svage-Dickey density rations for prior vs. posterior, i.e. Bayes Factors with hypothesis(m, “IV1_1:IV2_1:Slope-IV2_1:IV2_1:Slope=0”), which depends on the arbitrary prior scale.

Now… a different approach: the newest GitHub version of brms allows to specify by=IV ‘random effects’ (population variance / hyper parameters). This comes quite handy here, because this actually gives a population variance estimate in each condition of IV1 like this:
m<-brm(DV~0+IV1IV2+IV3+IV1:IV2:IV3+(1+IV2IV3| gr(subject, by=IV1)), family=bernoulli(),…)

What I have done now, is first simplistic (and I think wrong) to be as clear as possible: First, I take a “standard” convention like: a small effect is a difference of d=.2 (Cohens d; standardized mean difference), and the region I do not want the slope-effect differences (IV3) in is d=-.1 to d=.1 (usually called ROPE, see Kruschke’s late work). Now the by-IV1 variance estimates actually can be pooled in a classical sense, like: sigma_pooled=sqrt((sd_Intercept_IV1_1^2+sd_Intercept_IV2_1)/2), which then could be used to make sense of d_low=-.1 and d_high=.1. Namely, by their multiplication (e.g. d_low*sigma_pooled), which scales the “convention” of the Null-region to the log-scale, for a particular effect comparison. Such a region then could be checked for whether the posterior HDI of the slope differences lie within or outside it.
So far so good, I wonder if there are any objections until this point. I guess, the first “wrong” thing about this would be, to use an inappropriate variance term for pooling. This might be related to the question to whether one would take “within” or “between” subject-variance to interpret effect sizes like d. This then further splits up to which type of variance should be included (random intercepts, and which random slopes). For instance, I could use only the variance hyper estimates of the random slopes for IV3; or one could use all other residual terms, including the intercepts. Are there any technical estimation issues with this? In the latter case, however, I can not think about how all of these variance terms should be “added” together, because I simply don’t now how their deeper relation in the model code is, in the first place…

As it looks like, again, the reasoning behind this might become quite complex in a Bayesian sense, although the simple question still breaks down to:
Is there a neat way to “standardize” the slope - difference estimate for a better comparison to some meaningful decision criterion, about how much evidence is there, or whether the effect is “large” or something like this. I wonder, whether everything could be so simple.

A next approach:
So that is why my actual main question is: Is there a way in brms to model the slope effects - by the IV2 levels, directly as a combination of the grandMean ± (d/2)*SD; where the difference estimate is a standardized (Cohen d like) effect (unitized prior), multiplied with a variance-on-the-log-scale estimate. In my imagination, this would solve all my problems :)

Paul previously suggested to me that contrast.coding might be able to give me something similar to this, but I can not see it, because it still would imply estimating/modeling the Slopes (IV3; as in the last model version) independently, but not as a result of d*SD. This would again imply to transform the estimates to the standard scale (as above) for determining a (hopefully) meaningful ROPE, with all the above questions and possible difficulties. It also would appear less elegant to me, because directly modeling d and then transforming it into the actual effects (on the models-logit-scale), might automatically take all the other variance estimates into account by definition (this is actually a question…), which may not lead to the above question of which variance term I can and should use to standardize the slope-difference effects. (This actually almost answers my own question: because the SD term here, of course, should model the random slope variance of IV3…).

OK. I stop here. I hope my goals are understandable and that there is an easy solution I do not see :)

Best, René

martinmodrak · February 26, 2019, 1:31pm

Hi,
your question is very long and honestly, I didn’t read it completely. It feels like you are asking about multiple things at once and it might be useful to break it down to multiple questions and/or cut away anything not directly important. Also note, that you can use backticks to format code as code and dollar signs to allow MathJax for pretty printing of formulas (e.g. $d^2$ becomes d^2) which will likely improve readibility of your question. (see https://commonmark.org/help/ for more)

A general advice for interpreting complex models is to create samples of predictions (via posterior_linpred or posterior_predict) and than check a statement for each prediction sample. For example, you could predict for the observed cases and for the cases where IV3 increases by 1. For each sample you record 1 when the response has increased and 0 otherwise. You can then average over samples to get the posterior probability the response increases when IV3 increases by 1.

ReneTwo · February 27, 2019, 11:01am

Hello,

thanks for the hint on the formatting. (I check this.)
Indeed, the question is long and detailed, because I would like to motivate a specific (not a general) answer ;))

The short version would be:
Is there a way in brms to model the slope effects (in a logistic model as defined above) directly as a combination of a grandMean ± (d/2)*SD; where (Cohen d like) estimate d, can be estimated with a unitized prior?
(For details, about why this way, and not indirectly, see above).

Best René

martinmodrak · February 27, 2019, 11:29am

My answer to the short question would be:
a) I think brms does not support using fitted sd as predictor, but @paul.buerkner could rule it out with 100% certainty, so I think you would need to use Stan proper to do that
b) I would be surprised if you couldn’t compute d from posterior samples of a model fitted with a traditional coefficient, albeit this would not let you put the desired prior on d. Working with the posterior samples is IMHO really the way to go as those implicitly capture all of the model structure.

paul.buerkner · February 27, 2019, 11:34am

You could try out whether the non-linear framework of brms allows you to achieve this parameterization. See vignette("brms_nonlinear") for details.

ReneTwo · February 27, 2019, 1:00pm

Thank you both,

@paul.buerkner
This looks promising, but complicated, too :)
Let’s start with a slightly simpler model with only one continuous predictor and one two-level factor (Two main effects + their interaction).

As a start: Do you think this could be working?
(Note: I assume one can have multilevel terms in this type of function, for which there is no example given here: https://cran.r-project.org/web/packages/brms/vignettes/brms_nonlinear.html)

 brm(
  bf(DV ~ 0   + inv_logit(i1+i2+i12), 
    i1~ IV1*beta1,
    i2~ IV2*beta2,
    i12~ IV2*(IVX1*(-sig*d/2)+IVX2*(sig*d/2)),
    nl = TRUE),
  data = data, family = bernoulli("identity"), 
  prior = c(
    prior(student(1,1, 1), nlpar = "beta1"),
    prior(student(1,1, 1), nlpar = "beta2"),
    prior(student(1,0, 1), nlpar = "d"),
    prior(student(1, .1,1), nlpar = c("sig"), lb = 0, ub = 5)
           )
  )

DV=binary outcome; IV1=categorical factor (2 levels); IV2=continous predictor
IVX1(additionally dummy coded)= 1 if IV1 level 1 present, 0 if absent
IVX2(additionally dummy coded)= 1 if IV1 level 2 present, 0 if absent

To me it looks like it could work. But… … the sig parameter actually needs to be linked to the parameter estimated as by-subject random slope for IV2 (presumably sd_subject__IV2), and this is the place where my knowledge ends :)

Is this possible? Maybe like this?:

IV2*(IVX1*(-sig*d/2)+IVX2*(+sig*d/2))+(1+ sig | subject)

or perhaps more like this?

i2~ IV2*beta2+(1+ beta2 | subject),
IV2*(IVX1*(-sd_subject__beta2*d/2)+IVX2*(+sd_subject__beta2*d/2))

Best, René

paul.buerkner · February 27, 2019, 5:31pm

It’s not going to work this way for several reasons. Perhaps vignette("brms_multilevel") will help you further as it provides a more theoretical intro to brms’ non-linear syntax.

For a start, I would remove all multilevel structure until you get the simpler model working. You may still add multilevel structure later on.

paul.buerkner · March 1, 2019, 7:20am

After looking at your model again, I would start with the following:

bf(DV ~ eta + i12, 
    eta ~ IV1 + IV2,
    nlf(i12 ~ IV2*(IVX1*(-sig*d/2)+IVX2*(sig*d/2))),
    sig ~ 1,
    d ~ 1,
    nl = TRUE)

eta just handles the first two (linear parts) using a standard R formula while i12 handles the non-linear part. You can then later go ahead and add random effects to the formulas of eta, sig and d.

ReneTwo · March 6, 2019, 2:21pm

I will try this, and post the progression as soon as I come to it. Many thanks, so far!

ReneTwo · March 13, 2019, 11:33am

Ok. Back to business :)
Trying your model Paul seems to work at a basic level and some seemingly reasonable estimates are made (reasonable in the sense of what the model thinks I want to do… :)).

But, there is a basic issue, which I could not answer myself yet. As explained in the multilevel vignette, the first part of the formula:

bf(DV ~ eta + i12,

is interpreted literally (e.g. * = multiplication), while the other subsequent terms e.g.

nlf(i12 ~ IV2*(IVX1*(-sig*d/2)+IVX2*(sig*d/2))),

are interpreted in model-language (*= main effects & interaction).

The way the logic of constructing the means with … dsigIVX1 … would work, however, would require a partially literal interpretation. It seems to me, that this is impossible.
Anyway, the model-interpretation actually works nonetheless because IVX1 and IVX2 go into the term as continuous predictors and 0 is simply 0. However, the two are still treated as two predictors, and this means, that the model is redundant since IVX1 = 1-IVX2. I am not sure whether this is an estimation problem, so far it looks like I simply get one redundant output (i.e. the estimated interaction for IVX1 mirrors that of IVX2). Oddly, if I put both variables as factors into the model, I also get an estimated interaction between IVX1 and IVX2 in marginal_effects() (which are still mutually exclusive/redundant). In deviation/frequentist models this would have caused several errors… Anyway, the question is, (how) can I tell the model the correct design/interpretation? Edit: Actually, these odd things might just be a matter of marginal_effects trying to interpret the model output. I don’t know.

Assuming that these issues are not really “severe” (or that they are manageable), the next step would be to put some meaning into sig (i.e. “like” individual random slope variance of IV2). I see in the vignette for multilevel models, one actually could predict deeper-level model parameters (e.g. the zero_inflation zi in zero inflation models).
My first attempts were putting random intercepts or slopes into the eta formula e.g. +(1|subject), and then accessing these estimates in the later formula with e.g. (sd_subject__eta_Intercept*d/2).
The message I get, is that sd_subject__eta_Intercept is not in the data, which makes sense… so I tried an approach similar to the zero inflated example (where zi~child)

nlf(...sig*d/2...)
sig~sd_subject__eta_Intercept,

as well as

nlf(...sig*d/2...)
sd_subject__eta_Intercept~sig,
sig~1

None of them words (error messages again: not in data, or parameters should not have underscores)
Is there something I miss?

Best, René

paul.buerkner · March 13, 2019, 12:03pm

To quick comments:

nlf() ensures a literal interpretation as well.
You cannot directly access model parameters such as sd_subject__eta_Intercept in brms via the non-linear syntax.

ReneTwo · March 13, 2019, 12:50pm

Thanks Paul.
Alright. Good to know Nr.2. :)
This basically means than I can stop doing/trying this now… unless… wait:

What about this:
… (no… deleted)… :)

Best, René

paul.buerkner · March 13, 2019, 1:38pm

The model you deleted actually looked not too unreasonable to me.

ReneTwo · March 13, 2019, 1:55pm

ok. Intresting. :))
So, let’s put it back on the table.

bf(DV ~ eta + i12, 
    eta ~ IV1 + IV2,
    nlf(i12 ~ IV2*(IVX1*(-d/2)+IVX2*(d/2))),
    d ~ 1+(1|subject),
    nl = TRUE)

The main idea was subdivide the estimates of individual variations (1|subject), and the estimated effect d. The goal would be to achieve some kind of standardization for d. Now I deleted this, because the estimated d simply is not “standardized” but only somewhat “cleansed”.
But I guess I could use the estimate for the intercept as variance-estimate in order to standardize the posterior - and - the prior.
Standardizing the prior is the most problematic thing I thought off, which would be circumvented if one could actively scale/unstandardize d in the formula.

Now… here is a most silly idea… would this be possible (… I mean, the model runs, but does this look meaningful when actually known what the function does with this?)
Edited:

bf(DV ~ eta + i12, 
    eta ~ IV1 + IV2,
    nlf(i12 ~ IV2*(IVX1*(-d/2)+IVX2*(d/2))),
    nlf(d~d1*d2+d1),
    d1 ~ 0+(1|subject),
    d2 ~ 1,
    nl = TRUE)

Ok…
I know this looks crazy now, and it is mathematically incorrect, but it should approximate the correct solution. That is,
the part:
d1*d2+d1, actually should be currentPopulationMeanSampleofthisSD(d1)*d2+d1
Which means, the average estimate of d2 will probably be close to the “true” estimate (because over the single samples of d1 the estimation- error will cancel somewhat out), but the single participant estimates will not be equivalent to the usual notion of a individual error. This may come with the problem that this formula inherently defines a correlation between an individual error estimate of d1 and the extent of the individual raw slope estimates: d1*d2.
So, although this may approximate to a meaningful average for d2 on a standard scale, probably no one will buy this :))
Another edit:
d1*d2+d1 should be fabs(d1)*d2+d1 otherwise later ±d/2 difference estimates would be canceled out by negative intercept samples.

Best, René

Topic		Replies	Views
Divergences with logistic multilevel model with binary predictor Modeling specification , divergences , brms	4	63	September 5, 2024
Correlating random slopes brms	2	1278	November 17, 2020
Extracting an effect with dummy coding brms	3	578	September 18, 2021
Brms custom formula for logistic regression brms logistic-regression	3	668	May 29, 2023
Multinomial logistic regression with categorical family (brms) Modeling multinomial-response , brms	5	55	May 2, 2025

Logistic Regression with Interactions (ContinuousXFactors(>2Levels))

Related topics