Linear no-intercept model in brms interpretation

fritz · April 16, 2025, 12:37pm

Hi everyone. I’m new to stan and bayesian stats and have a rather stupid question about understanding the model definition in brms non-linear specification.
I’m trying to gage the influence of different predictors on a continouos output variable (ratings on an interval scale 0-100).
Since I was unsure about the difference between computing a gamma and scaling the data to 0-1, I instead mean centered the data on 0 and fit the following model on the original scale, with all factors sum coded so they also add to 0.

formula.univar.ov.Int = bf(
  overall_quality ~ 0 + sys + sex + age + consc + neur + open + agree + extra + pos + neg + exp + sys, 
  sys ~ 0 + system_id,
  sex ~ 0 + Sex,
  age ~ 0 + Age_Bins,
  consc ~ 0 + mo(conscientiousness),
  neur ~ 0 + mo(neuroticism),
  open ~ 0 + mo(openness),
  agree ~ 0 + mo(agreeableness),
  extra ~ 0 + mo(extraversion),
  pos ~ 0 + mo(positive_affect),
  neg ~ 0 + mo(negative_affect),
  exp ~ 0 + mo(experience)
  nl=TRUE)

priors.full.univar.ov.Int <- c(
  # Priors on fixed effects: assume effects ~ few units
  prior(normal(0, 5), class = "b", nlpar='age'),
  prior(normal(0, 5), class = "b", nlpar='sex'),
  prior(normal(0, 10), class = "b", nlpar='sys'),
  prior(normal(0, 5), class = "b", nlpar='consc'),
  prior(normal(0, 5), class = "b", nlpar='neur'),
  prior(normal(0, 5), class = "b", nlpar='open'),
  prior(normal(0, 5), class = "b", nlpar='agree'),
  prior(normal(0, 5), class = "b", nlpar='extra'),
  prior(normal(0, 5), class = "b", nlpar='pos'),
  prior(normal(0, 5), class = "b", nlpar='neg'),
  prior(normal(0, 5), class = "b", nlpar='exp'),
  # Residual standard deviation: weakly informative, matches the scale
  prior(exponential(1), class = "sigma")
)

This is following the brms explanation in [1] with regards to mcelreaths statistical rethinking.
My understanding is, that this does not compute the parameters with regards to a reference intercept for each of the factors, but instead computes the variance with regards to the grand mean (which in this case is 0 because I centered it)

Following this I should be able to interpret the displacement from 0 as actual effects, as for example in this HDI plot.

In general I’m asking if:
a) this interpretation is valid and I can safely interpret the differences from 0 as comparable effects between the predictors on the outcome rating (within the context of this model of course)?
b) this also holds for the monontonic predictors estimate through the simplex step function?

Solomon · April 16, 2025, 2:43pm

Hey @fritz. This looks like your first post, so welcome to the Stan forums!

Your model is rather complicated, which makes it challenging to follow along. In general, I recommend asking questions with the simplest versions of your model as possible, while still capturing your issue. In a similar way, I recommend you actually scale your model way back to include, say, one categorical variable and one continuous variable. Then see if you can make sense of the model parameters in that easier context. This is the kind of approach I use when debugging my own models, and it’s also the approach I used when writing that ebook.

As a tangent, I don’t recommend using the non-linear syntax I presented in that ebook. The only reason I used it was to present brms analogues to the models in McElreath’s 2nd edition. But personally, I find that syntax terrible, and I think it’s just begging for confusion. I recommend using the more conventional approach to nominal variables McElreath used in his 1st edition.

fritz · April 19, 2025, 10:46am

Thank you for the kind advice and apologies for the broad framing of my question.

After reading some more I think I have narrowed in on what confuses me in this context.

I ran some simulations using the same example given in the monotonic vignette and got the following results.

A) In a no intercept vs. with intercept model the first monotonic simplex parameter gets overestimated. Why is that? Is there a way to look at the design matrix used to fit the model? I saw the post and documentation about 0+ meaning not mean centering the predictors in the design matrix, but I don’t have a good intuition about what that means in the context of the monotonic function. In general I thought the 0+ syntax avoids computing the parameters with regards to a reference level of the categorical factor, but rather computes them with reference to the grand mean and the main problem in doing so is potential overparameterization. But now these simulations show different effect sizes for 1 of the parameters. I would have either expected the CI intervals to get bigger or the error estimate go up, but that does not seem to be the case.

income_options <- c("below_20", "20_to_40", "40_to_100", "greater_100")
income <- factor(sample(income_options, 100, TRUE),
                 levels = income_options, ordered = TRUE)
mean_ls <- c(30, 60, 70, 75)
ls <- mean_ls[income] + rnorm(100, sd = 7)
dat <- data.frame(income, ls)

fit1 <- brm(ls ~1 + mo(income), data = dat, seed = 123)


Formula: ls ~ 1 + mo(income) 
   Data: dat (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    30.94      1.33    28.22    33.55 1.00     2862     2693
moincome     15.40      0.66    14.12    16.73 1.00     2757     2266

Monotonic Simplex Parameters:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moincome1[1]     0.63      0.04     0.56     0.71 1.00     3058     2049
moincome1[2]     0.17      0.04     0.08     0.25 1.00     3026     2253
moincome1[3]     0.20      0.04     0.12     0.28 1.00     3608     2327

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     6.86      0.50     5.95     7.91 1.00     3696     2570

fit2 <- brm(ls ~ 0 + mo(income), data = dat, seed = 123)


Formula: ls ~ 0 + mo(income) 
   Data: dat (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moincome    25.66      1.17    23.42    28.01 1.00     2393     2559

Monotonic Simplex Parameters:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moincome1[1]     0.78      0.06     0.67     0.89 1.00     2354     1963
moincome1[2]     0.11      0.05     0.01     0.22 1.00     3311     2100
moincome1[3]     0.12      0.05     0.02     0.22 1.00     2614     1995

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    17.64      1.26    15.43    20.38 1.00     2545     2114

B) I also noted that the effect disappears when using the 0+ syntax in a non-linear model with an added categorical predictor. I don’t really understand how that makes a difference.

(Just the monotonic predictor in NL-syntax: larger effect is still there)

dat$x <- sample(c("a", "b", "c"), 100, TRUE)
fit7 <- brm(
 formula = bf(ls ~ 0 + inc,
              inc ~ 0 + mo(income),
              nl=TRUE),
 data = dat,
 seed = 123)


Formula: ls ~ inc 
         inc ~ 0 + mo(income)
   Data: dat (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
inc_moincome    25.66      1.17    23.42    28.01 1.00     2393     2559

Monotonic Simplex Parameters:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
inc_moincome1[1]     0.78      0.06     0.67     0.89 1.00     2354     1963
inc_moincome1[2]     0.11      0.05     0.01     0.22 1.00     3311     2100
inc_moincome1[3]     0.12      0.05     0.02     0.22 1.00     2614     1995

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    17.64      1.26    15.43    20.38 1.00     2545     2114

(monotonic predictor + categorical factor: Same results for monotonic predictor as with Intercept, despite specifying 0+ on both)

dat$x <- sample(c("a", "b", "c"), 100, TRUE)
fit6 <- brm(
 formula = bf(ls ~ 0 + inc + xvar,
              inc ~ 0 + mo(income),
              xvar ~ 0 + x,
              nl=TRUE),
 data = dat,
 seed = 123)


Formula: ls ~ 0 + inc + xvar 
         inc ~ 0 + mo(income)
         xvar ~ 0 + x
   Data: dat (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
xvar_xa         29.44      1.58    26.37    32.52 1.00     2445     2512
xvar_xb         31.82      1.79    28.27    35.33 1.00     2339     2529
xvar_xc         32.02      1.66    28.74    35.25 1.00     2436     2726
inc_moincome    15.21      0.69    13.85    16.56 1.00     2064     2375

Monotonic Simplex Parameters:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
inc_moincome1[1]     0.63      0.04     0.56     0.71 1.00     2788     2603
inc_moincome1[2]     0.17      0.04     0.08     0.25 1.00     2885     2230
inc_moincome1[3]     0.20      0.04     0.12     0.28 1.00     3278     2419

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     6.84      0.51     5.92     7.94 1.00     3208     2614

Sorry if these questions are still too basic for this forum, but I am also still new to statistical modeling as a whole. I would also take pointers to a more appropriate forum if they are out of place here.

Solomon · April 19, 2025, 2:48pm

Hey @fritz, I don’t think your questions are too basic for this forum. There’s nothing basic about a monotonic predictor. I should have time to give your response a more thorough look in the next day or two.

Solomon · April 27, 2025, 10:29pm

Hey @fritz, I apologize for the delay.

When you’re treating a categorical variable in the conventional way, the 0 + ... syntax estimates each level of the categorical variable independently. However, the mo() approach appears to need a reference category. You can get a sense of this in the Introduction section of Paul’s vignette where he wrote

D is the number of categories (or highest integer in the data) minus 1, since we start counting categories from zero to simplify the notation.

In this case, when you use the 0 + ... syntax, you have effectively fixed the intercept for the first category to zero, rather than estimating it from the data. Therefore, it looks like you should not use the 0 + ... syntax when you use the mo() approach.

Topic		Replies	Views
Bernoulli BRMS Modeling	4	1330	February 25, 2020
Understanding brms random intercept output Modeling fitting-issues , specification , brms	2	438	August 28, 2023
Hierarchical random intercept model - Stan vs brms Modeling specification , brms	5	126	May 6, 2025
Intercepts while nl=TRUE and random effects brms	4	524	March 29, 2020
Interpretation of priors on nonlinear parameters brms	2	1111	May 21, 2019

Linear no-intercept model in brms interpretation

Related topics