Bernoulli BRMS

Hello everyone,
(sorry in advance for my extensive message… i’m a beginner in stan and BRMS)
Here’s my problem.
I have a dependent variable, denoted REP, binary (1 or 0). I wish to predict the probability of answering 1 with a non-linear model such as, REP ~ inv_logit ((X/(1 + k*D)) - X2). with X and X2 two numerical variables , D a numerical variable and k : the free parameter to estimate.


I would like to estimate the parameter “k” according to the different experimental conditions I have realized.
Independent variable A: “DOMAIN” with two modalities (1: LOSS, 2: GAIN). Each subject is exposed to these 2 modalities of this independent variable.
Independent variable B: “STATUS” with two modalities (1: Control, 2: Threat). A participant is either “control” or “threat” but never both.

my data frame :

I would like to estimate the parameter “k” for each condition :

  1. k1_Control_GAIN,
  2. k2_Control_LOSS,
  3. k3_Threat_GAIN,
  4. k4_Threat_LOSS.

My hypothesis:
k1_Control_GAIN < k3_Threat_GAIN
k2_Control_LOSS < k4_Threat_LOSS

I have several ways to code this I think, the way I tried is this: (I’ve simplified the model here to make it more understandable)

main_model = bf(
  rep ~ inv_logit(  
 ((X / (1 + exp(k1 * dummy_ctrl * dummy_gain + k2 * dummy_gain * dummy_threat + k3 * dummy_loss * dummy_ctrl + k4 * dummy_loss * dummy_threat)*D ) - X2))
  ,
  k1 ~ ( 1| ID ),
  k2 ~ ( 1| ID ) ,
  k3 ~ ( 1| ID ),
  k4 ~ ( 1| ID ),  
  nl = TRUE)

with dummy_ctrl = 1 when the subject is in the “control” group and 0 otherwise
with dummy_threat = 1 when the subject is in the “threat” group and 0 otherwise
with dummy_GAIN= 1 when the subject is exposed to a trial of the “GAIN” condition and 0 otherwise
with dummy_LOSS= 1 when the subject is exposed to a trial of the “LOSS” condition and 0 otherwise

main_fit = brm(main_model,
               data = DF, family = bernoulli(link='identity'),
               prior = c(prior(normal(-5,3), nlpar = 'k1'),
                         prior(normal(-5,3), nlpar = 'k2'),
                         prior(normal(-5,3), nlpar = 'k3'),
                         prior(normal(-5,3), nlpar = 'k4')),
               cores = parallel::detectCores(),
               seed = "12345",
               inits = "random",
               chains = 4,
               iter = 4000,
               warmup = 2000,
               control = list(adapt_delta=0.99))

I don’t think my way is right.
I think it’s better to do it like this?

main_model = bf(
  rep ~ inv_logit(  ( (X / (1 + exp(k)*D ) - X2)) ,

  k ~ DOMAIN*STATUS + (1 + DOMAIN | ID),

  nl = TRUE)

Thank you in advance for your responses.

Sorry for not responding earlier. Did you manage to resolve this? Maybe @MegBallard is not busy and can help?

Also note that you can use triple backticks (```) to format blocks of code in your post. I’ve edited your post to this effect for you.

The main difference here is that In brms you don’t need to code dummy variables. The program will do that internally for data that is not numeric. This makes interpreting the results and extracting diagnostics of models easier to interpret.

I usually start with very basic model equation and then add to it if I feel like I have worked out the basic form.

If you know that your variables X1, X2, and D definitely have a non linear relationship, the second form is probably most appropriate. I successfully ran this model on a bit of fake data.

fit <- brm(bf(
       REP ~ inv_logit( ( (X / (1 + exp(k) * D ) - X2))),
       k ~ DOM * STAT,
       nl = TRUE), 
   family = bernoulli(link='identity'), 
   prior=prior(normal(-5,3), nlpar = 'k'),
   data=df)

A good approach in these more complicated models is to simulate some data that you know the relationships of and then build your model and see if you can get recover the parameters you fed into it.

I just spent a bunch of time making up a data set to test your model code on. If you provide a toy data set that helps when people want to help you correctly specify your model.

I have not done a model like this, so I’m curious if it correctly tracks the ID through the model to the response. I think there is a way to specify that responses in the linear part correspond to groups (ID) in the non-linear part. I’m not sure about it but I think I saw it somewhere along the way.

Check out the section in the brms documentation about formula “a terms” or “addition terms” that address parts of the LHS of the model formula.

Hope that was a little helpful. The simpler you make it for people to help you (provide some example data and code that will run with the data you provide), the better your chances of finding someone who will give it a try!

Cheers!

5 Likes

Hello, Meg,

thank you very much for your response.

In fact, I have already done a parameters recovery. And it seems that my first model works. I do recover my individual parameters for each individual in the gain and loss domain.
However, I have simulated noiseless data…

For the second model I propose, I don’t know… I don’t really understand the results I’m getting.

I will put my code in a next post. But probably next week because then I have to code an experiment that takes all my time and I have to check the code of my simulation…

I made my simulation on matlab because I’m more comfortable with it. But I will post both the simulated data set and the code if you want to play with it.

Moreover, I will test your proposition and tell you how it works.

cheers and many thanks again.
Vincent

Thank you Martin. I didn’t know backticks to format blocks of code.
I will try for my next post !