Non-linear hierarchical modelling (logarithmic) problem

Hi everyone,

I’m trying to set up my first non-linear hierarchical model with the help of brms, but I’m not sure I’m doing it right. I would be grateful if anyone could point me in the right direction!


Study participants (“id”) had to give repeated estimates (“estimate”) about positions on a ruler ranging from 0-18’278. The actual positions are coded as “actual”. Dataset (dropbox link).


Individual estimates are more or less logarithmic.


I want to fit a non-linear model to this to determine the best fitting base of the logarithmic function. I did this as follows (pooled model):

data <- read.csv("sz_data_brms.csv")
prior1 <- prior(normal(1,2), class="b", nlpar = "b1", lb=1)
fit1 <- brm(bf(estimate ~ log(actual)/log(b1), b1 ~ 1, nl = TRUE), data = data, prior = prior1)

I set a lower bound (>1) to the prior of b1 to avoid negative values. As stan does not have a logarithmic function with freely specifiable base, I implemented it as log(x)/log(b) which equals log_b(x).

This worked fine (no fitting problems and reasonable estimate).

Now, I would like to allow the base of the logarithm to vary between participants. I set up the model like this:

prior2 <- prior(normal(1,2), class="b", nlpar = "b1", lb=1)
fit2 <- brm(bf(estimate ~ log(actual)/log(b1), b1 ~ 1 + (1|id), nl = TRUE), data = data, prior = prior2)

However, this model arrives at an unreasonable estimate, has problems with the effective sample size, convergence and divergent transitions and takes a long time to fit, indicating that there is something substantially wrong with it.

I’m sure I did something wrong and I tried to look up the model specification for non-linear models with bf() here, but I could not figure out where the error lies exactly. Would be thankful for any pointers!



1 Like

Hi, sorry for not getting to you earlier, the question is relevant and well written.

I should note that your model is almost equivalent to a linear model with log(actual) as a predictor, i.e. if estimate ~ log(actual)/log(b1) then also estimate ~ b_logactual * log(actual) where one can recover b1 = exp(1 / b_logactual) = exp(- b_logactual) (the only difference is that priors on b_logactual act differently than priors on b1). But this is just a normal linear model!

So one can fit the data as:

fit_log1 <- brm(estimate ~ 0 + log(actual) + (1 | id)  , data = data)

This fit runs without issues for me.

(the 0 is there to remove the intercept, which matches your orignal model but that might not necessarily be a great idea).

However, it is unclear whether gaussian noise is appropriate here, as we are working on the logscal, so one migth also want to try a lognormal family (but then you need to somehow handle the zeroes in the estimate column, possibly by replacing them with a small number).

Hope that helps at least a bit and good luck with your model!

1 Like


thanks for taking the time to help me!

The suggestion you made works and the logarithmic model is able to fit the data well. I’m also able to recover the best-fitting base, so that’s great! I slightly adjusted the code though because I need a fixed intercept (at 0) and a random slope for id:

fit_log2 <- brm(estimate ~ 0 + log(actual) + (0 + log(actual) | id) , data = data)

In a next step, I wanted to compare the logarithmic model to a power-function model. I implemented it as follows and it worked well too:

prior_pow <- prior(normal(1,2), nlpar="b1")
fit_pow <- brm(bf(estimate ~ actual^(b1), b1 ~ (1|id), nl=TRUE), data=data, prior=prior_pow)

Am I correct in assuming that the specified power-function model has a fixed intercept (at 0) and random slope for id?

Additionally, I looked into family and link functions, but I’m not sure how to determine whether gaussian noise is appropriate for the (log) model. For now, I have looked at the residuals and heteroscedasticity and there is no apparent funnel, which is a good thing, I assume (there is more variance in the middle of the scale though). Is it okay to conclude that the link functions are more or less appropriate based on this?

I also ran the model with family = ”lognormal” (adding a constant to zeroes), but I struggle to interpret the resulting output (and heteroscedasticity plot). The estimate and the predictions seem unreasonable, and I am not sure what’s happening to the residuals.

Finally, it might be relevant that my primary goal is not to fit the data as precisely as possible but to compare the predictions of different models based on theory (one approach would predict a logarithmic relationship, while the other would predict a power-function relationship, neither makes any explicit assumptions about the error distribution though).

Thanks a lot,


Looks like it does. Note that you can use make_stancode to check the generated Stan code whenever you are unsure what the model actually does (the code is not always completely clear, but quite often one can get useful infro from reading it).

That looks like a good start. One approach that is often useful are posterior predictive checks (via pp_check) where for example doing a stat check with stat = sd could (and possibly some grouping via stat_group) could be a good way to see if your model captures the variance well. For a (little) more background on this see the Bayesian workflow preprint, especially section 6.1.

That’s good to bear in mind. Posterior predictive checks might actually be a good way to test which aspects of the data are fit well by either model, so if one model passes a check and the other does not, it is an indication that at least in the aspects the check is testing one of the models is better.

Hope this is clear and the checks help you.

1 Like


thanks again for your help!

According to the Stan code (make_stancode) the specifications b1 ~ (1|id) or b1 ~ 1 +(1|id) are equivalent. I checked the code, and I believe it does what I want, which is great!

Thanks a lot for the pointer towards the preprint and posterior predictive checks! As you anticipated, the model supposed to fit worse (power) shows worse performance in the checks. However, it seems like both models have shortcomings, with neither being able to handle the variance well. I played around with different link functions and noise distributions, but none of them seem better suited to the data, so I stayed with gaussian noise for now.

I suspect that the noise is influenced by the upper and lower boundaries of the task (people cannot give an estimate below 0 or above the end of the scale: 18,278). The boundaries allow for more variance in estimates in the middle of the scale (reverse u-shaped relationship). However, as far as I know, there isn’t an easy way to account for this in the model. Do you maybe know of any way to deal with this? Otherwise, I think I will stay with gaussian noise because it should not systematically distort the model according to my knowledge.



1 Like

Yes, intercept is implicit and will be added unless you explicitly remove it (e.g. by b1 ~ 0 + (1 | id)).

You can add truncation (that some values are impossible) via something like estimate | trunc(lb = 0, ub = 18278) ~ ... which should IMHO mimic the process well.

1 Like


thanks again for your help!

I added truncation to the models, but it caused issues (divergent transitions, non-convergence etc.). I suspect that this might be due to some values being at the truncation border (discussed here). I tried to broaden the borders to avoid this, but the problem remained. Since I could not find a solution to this, I think I will go with gaussian noise.

Thanks a lot,


1 Like