Please also provide the following information in addition to your question:

- Operating System: linux
- brms Version: 2.8.0

Is there good advice out there on how to best approach a fairly complex non linear regression modelling task in simple steps to ensure convergence and good fits in brms? such as:

- Iterative approaches for building up complexity from simple to complex models and priors, maybe even only using subsets of data to set direction before fitting the entire data set?
- The use of simulation studies to inform model and prior formulations to ensure convergence and quality of fit?
- The number of samples needed to get convergence and good fits (as function of model
*complexity*)?

Let me explain:

Recently, I have built a model for predicting ‘non-linear’ curves as a function of physical and chemical factors:

y ~ (t > tg) * Gm * (1-exp(-k * (t-tg)))

Where t is the time and the parameters tg, Gm, and k depend on cacl2, pH, temperature, protein levels and enzyme dose.

I had curves from fractional factorial design of *46 experiments*.

My naive approach has been to:

- first fit all curves using brms nonlinear regression, estimating the parameters of each curve and the associated standard errors.
- then estimate each parameter (and associated std. error via the se() term) as a function of the experimental conditions using a linear model, e.g. y | se(std.error, sigma = TRUE) ~ …

This yielded good results when comparing model predictions to hold-out samples. I could stop here, but I’m feeling I’m not getting the optimal results as if fitting everything in one go, such as the covariance structure between the parameters…

However, when I try to do all in one go:

bform <- bf(firmness ~ (minutes > tg) * Gm * (1 - exp(-k * (minutes - tg))),

tg ~ (pH + temperature + dose + protein)^2,

Gm ~ (pH + temperature + dose + protein)^2,

k ~ (pH + temperature + dose + protein)^2,

nl = TRUE)

using priors informed by the first approach:

bprior <- prior(normal(11, 3), nlpar = tg, coef = Intercept)+

prior(normal(0, 5), nlpar = tg)+

prior(normal(260, 25), nlpar = Gm, coef = Intercept)+

prior(normal(0, 5), nlpar = Gm)+

prior(normal(0.22, 0.05), nlpar = k, coef = Intercept)+

prior(normal(0, 0.1), nlpar = k)

Stan fails to converge at a solution of similar quality as the piece-wise approach.

I then tried to use simulation studies, building up a data set by using the findings of the piece-wise approach to understand what was going on:

- For a given design in the physical and chemical parameters
- I caclulate the value of each parameter for each experiment
- I use the calculated parameter to simulate a curve for each experiment
- I add noise to the curves
- I fit all parameters in one go

From these few studies it appears that the two step approach is much more efficient than the fit all in one approach; it takes *less time* and I need fewer *experiments* to arrive at the underlying model with the two step approach. Is this a well known result? Are there well known tricks, other than tight priors, to guide the sampler? Or am I aproaching it all wrong?

Any advice and heuristics on how to efficiently build complex non-linear models in brms will be highly appreciated!

Thanks

/Jannik