Brms model results change based on dummy variable included

Working with brms version 2.16.3 on windows 10.

I am trying to specify the following model in brms:

Decision ~ Normal(u,sd)
u = baseline + social information
baseline = 1 + Score
social information = (Behaviour * (1+Success)) * Social

Behaviour is a variable with 3 levels (Increase, Neutral or Decrease) and Success is a dummy variable 0/1. Success being 0 represents the Majority condition. Social is a dummy variable where 1 indicates that participants saw social information. IE. I need the effect of social information to be 0 if they did not see any social information. I use non linear syntax 0 + Behaviour because I want to index the behaviour variable, rather than dummy code it.

I have attempted to build the model using the following definition.

Model <- brm(data = d,
                  family = gaussian,
                  bf(Decision ~ a + (b * (1 + Success) * Social),
                  a ~ 1 + Score,
                  b ~ 0 + Behaviour,
                  nl = TRUE),
                  prior = c(prior(normal(0,4), nlpar = a),
                            prior(normal(0,4), nlpar = b),
                            prior(exponential(1), class = sigma)),
                  iter = 4000, warmup = 1000, chains = 4, cores = 4,
                  seed = 222)

Where stancode generates this:

// generated with brms 2.16.3
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K_a;  // number of population-level effects
  matrix[N, K_a] X_a;  // population-level design matrix
  int<lower=1> K_b;  // number of population-level effects
  matrix[N, K_b] X_b;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K_a] b_a;  // population-level effects
  vector[K_b] b_b;  // population-level effects
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] nlp_a = X_a * b_a;
    // initialize linear predictor term
    vector[N] nlp_b = X_b * b_b;
    // initialize non-linear predictor term
    vector[N] mu;
    for (n in 1:N) {
      // compute non-linear predictor values
      mu[n] = nlp_a[n] + nlp_b[n];
    }
    target += normal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += normal_lpdf(b_a | 0, 4);
  target += normal_lpdf(b_b | 0, 4);
  target += exponential_lpdf(sigma | 1);
}
generated quantities {
}

The model runs without any issues. However, the model predictions change depending which of Success or Majority I include in the first line. ie.

bf(Decision ~ a + (b * (1 + Success) * Social),

Produces different results to:

bf(Decision ~ a + (b * (1 + Majority) * Social),

Which shouldn’t happen, and makes me think I am making some mistake in my specification of the model.

My questions are:

  1. Is there an obvious mistake in the way I am attempting to build this model?
  2. Is there an alternative way I can build a model that is equivalent to the definition I provide above?

I hope this is enough information for someone to offer some advice as I am unfortunately not able to just share the data in this case.

I really appreciate any help anyone can offer.

I’m not sure I understand the difference between Success and Majority. Is Majority=1 the same as Success=0?

Ah sorry, I should have explained it better.

Success and Majority are 2 conditions in the experiment. So one could dummy code either one and include it into the model to represent both conditions. So, in answer to your question, Success = 0 should be equivalent to Majority = 1 and vice versa.

In my definitions above, purely as a sense check I tried it both ways to check the results were the same. The fact that they weren’t is what confused/concerned me.

When you say the two models give different results, do you mean that the predicted value of Decision for a given observation is different in each model?

I haven’t checked it per observation (I could, if that is useful), but yes; mean Decision predictions are different between the two models. In fact, the predictions almost precisely mirror one another, it’s pretty weird.

I don’t have much experience with nonlinear models, so hopefully one of the experts will chime in and confirm whether I’m correct, but here’s what I think is happening: Unlike the linear model formula interface, the nonlinear formula interface is interpreted literally as a mathematical formula. So the formula in your model is:

Decision = a + b*(1+Success)*Social = a + b * Social + b * Success * Social

There are two cases:

Success=0 (and Majority=1): Decision = a + b*Social
Success=1 (and Majority=0): Decision = a + 2*b*Social

But when you substitute Majority for Success in the formula, the meaning of each case is reversed:

Decision = a + b*(1+Majority)*Social = a + b * Social + b * Majority * Social

Majority=0 (and Success=1): Decision = a + b*Social
Majority=1 (and Success=0): Decision = a + 2*b*Social

This would explain why you’re getting results that look like mirror images of each other when you compare the two models.

Hmm interesting. Okay, that does make some sense. I wasn’t aware that non linear syntax behaved in that way. As you say, would be great if somebody could confirm though.

And, crucially, explain how I can fix my model.

One thing I did try which didn’t work was computing social information within the non linear paramater portion. Like this:

brm(data = d, family = gaussian,
                  bf(Decision ~ a + b2,
                  a ~ 1 + Score,
                  b1 ~ 0 + Behaviour, 
                  b2 ~ b1 * (1 + Majority * Social),

But this throws errors as it doesn’t seem to like using b1 within the definition of b2.

Either way, thank you for your input.

I’m not sure I understand the actual model you want to fit. Can you write out the model in standard mathematical notation (i.e., make all the variables, parameters, and functional form explicit)?

Sorry for the delayed response. Here is the full notation of the model I want to fit:

Decision \sim Normal(\mu , \sigma)
\mu = Baseline + Social info
Baseline = \alpha + \beta_1Score
Social info = (\beta_2Behaviour *(1 + \beta_3Success)) * \beta_4 Social
\alpha \sim Normal(0,4)
\beta \sim Normal(0.2)
\sigma \sim Exponential(1)

In words: Social info is made up of a parameter associated with the behaviour observed multiplied by an additional effect of one of the conditions (Success or Majority). It is then all multiplied by Social, such that the effect of Social info is 0 when data is not from the Social condition.

Baseline is purely an intercept + an effect of Score. Decision (the outcome) is then the baseline + the social info effect.

Thank you for continuing to look at this for me, I really appreciate it.

I think the model as written can be recast as the following (linear) model:

\mu = \alpha + \beta_1Score + \beta_2Behavior \cdot Social + \beta_3Behavior \cdot Success \cdot Social

There are only three unique parameters in the model as written, the last two of which I’ve recast as \beta_2 and \beta_3. The linear model formula notation for this model would be:

Decision ~ Score + Behavior:Social + Behavior:Success:Social

Or, if the intercept is removed:

Decision ~ 0 + Score + Behavior:Social + Behavior:Success:Social

The model includes two of the four possible interactions of Behavior, Success, and Social, and also excludes their main effects. I just wanted to make sure this is what you intended. It seems unusual, but maybe subject knowledge suggests this structure.

Thank you for your reply and advice.

Yeah, the structure is a little unusual, but it makes sense in my specific context. It also seems that using a colon rather than * causes BRMS to index Behaviour rather than dummy coding it, so that is very good to know.

I still cannot get the original model to work though (the one that uses 1 + success etc.). It would still be great if somebody was able to explain how such a model could be built into brms (if it can). But I may very well end up using a simpler form instead.

I think perhaps more clarification could be used to understand what’s going on here.

Could you elaborate on what you mean here? Isn’t this the expected behavior: if we switch all 0s to be 1s and all 1s to be 0s in a variable (e.g., going from Success to Majority or vice versa), then I’d expected “mirrored” predictions. This would be consistent with @joels comment detailing the non-linear formula. Seems like perhaps your original model works as intended in that case, but I think that depends on what exactly you mean by the result mirroring one another

Ah yes, you’re right, my original description was not clear enough. I should have been more specific.

By mirrored, I am talking about the conclusion regarding which condition has the highest Decision. I’ll illustrate through an example. In each case, I am considering Behaviour = "Altruism.

If we include Success into the model. The model predicts a mean Decision of 3.37 when Success = 0 and 3.98 when Success = 1.

However, If I include Majority into the model, this pattern flips (at least, qualitatively). With Majority in the model: the model predicts a mean Decision of 3.35 when Majority = 0 and predicts 3.67 when Majority = 1.

So when I say “mirrored” this is what I mean. Considering that Success = 0 with Success in the model should be equivalent to Majority = 1 when Majority is in the model. I should say, this behaviour is not specific to brms (though, in brms it is more pronounced). I tried fitting the same model using rethinking() and JAGS and found the same behaviour.

I hope this is more clear? It’s extremely strange.

Do we need to marginalize the discrete predictors?

1 Like

I love to see an example of anyone pulling this off with brms.