Non-linear power relationship model under a lognormal distribution - realtionship to parameters

I am unfortunately very confused about the relationship between parameters and the user supplied formula in a non-linear model under a log normal distribution. I have read the Bürkner, 2017 paper “Bayesian Distributional Non-Linear Multilevel Modeling with the R Package brms” but I fear I may lack the statistical background to understand how the equations therein actually relate to a real world example. I would be extremely grateful if someone can explain in simpleton friendly terms, and/or point me towards some resources to help me get a better grounding. The Package brms is fantastic, and its applications would be extremely useful for my work, so I am trying to get my head around it. Basically, I just want to get a mathematical representation of the equation using the model output.

For example, say I have this model that follows (what I hope is the correct syntax) for a power relationship a*x^b:

b <- c(2.12, 0.35)
x <- rnorm(500)
y <- rlnorm(500, mean = b[1] * x^ b[2])
dat1 <- data.frame(x, y) %>% filter(x > 0)

prior1 <- prior(normal(0, 1), nlpar = "b1", lb = 0) + 
  prior(normal(0, 1), nlpar = "b2", lb = 0) 

formula.fit.1.gaus <- brm(bf(y ~ b1 * x ^ b2, 
                        b1 ~ 1,
                        b2 ~ 1,
                        nl = TRUE),
                     data = dat1, 
                     prior = prior1, 
                     iter = 5000, 
                     warmup = 2000,
                     thin = 5, 
                     cores = 4,
                     family = gaussian(),
                     file = "formula.fit.1.gaussian",
                     sample_prior = "yes",
                     save_all_pars = T, 
                     seed = 1234)

plot(conditional_effects(formula.fit.1.gaus), points = TRUE)

Returning output

Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ b1 * x^b2 
         b1 ~ 1
         b2 ~ 1
   Data: dat1 (Number of observations: 262) 
Samples: 4 chains, each with iter = 5000; warmup = 2000; thin = 5;
         total post-warmup samples = 2400

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
b1_Intercept     6.53      0.82     4.97     8.09 1.00     2352     2220
b2_Intercept     1.47      0.26     0.88     1.94 1.00     2101     2046

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    12.87      0.62    11.69    14.17 1.00     2377     2333

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My understanding is the estimates from the model output cannot simply be put back into the user supplied formula, as each non-linear parameter is a placeholder for a linear equation? If so, how then can I obtain the values that can be used to summaries the model?

However, when I try to put the estimates into the power equation (b1*x^b2) and plot them ( curve(6.53*x^1.47, from = 0, to = 3.315768) ), at a glance it appears to be fairly similar to the model described using plot(conditional_effects(formula.fit.1.gaus), points = TRUE) . Can someone explain why this is? Is it just coincidence, or can the estimates sometimes be used, or have I just completely misunderstood?

Following this, if I do the same model, but under a lognormal distribution:


prior1 <- prior(normal(0, 1), nlpar = "b1", lb = 0) + 
  prior(normal(0, 1), nlpar = "b2", lb = 0) 

formula.fit.1.log <- brm(bf(y ~ b1 * x ^ b2, 
                        b1 ~ 1,
                        b2 ~ 1,
                        nl = TRUE),
                     data = dat1, 
                     prior = prior1, 
                     iter = 5000, 
                     warmup = 2000,
                     thin = 5, 
                     cores = 4,
                     family = lognormal(),
                     #control = list(adapt_delta = 0.99,
                     #              max_treedepth = 12), 
                     file = "formula.fit.1.log",
                     sample_prior = "yes",
                     save_all_pars = T, 
                     seed = 1234)

with results

 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: y ~ b1 * x^b2 
         b1 ~ 1
         b2 ~ 1
   Data: dat1 (Number of observations: 262) 
Samples: 4 chains, each with iter = 5000; warmup = 2000; thin = 5;
         total post-warmup samples = 2400

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
b1_Intercept     2.11      0.07     1.97     2.24 1.00     2607     2353
b2_Intercept     0.44      0.05     0.34     0.54 1.00     2138     2292

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.97      0.04     0.89     1.06 1.00     2394     2290

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I understand I cannot simply back transform parameters using exp() - but how could I go about getting the parameters from this model to place back into the supplied formula as a mathematical approximation?

Again would appreciate any insight whatsoever!

The best general resources I can recommend are at Understanding basics of Bayesian statistics and modelling. In fact, I think brms is actually one of the hardest way to get a good understanding of Bayesian modelling, because it just hides so much from you (while at the same time letting you very quickly build a huge range of models), so for a beginner trying to understand, I would recommend trying to implement at least a couple models directly in Stan. This is IMHO doubly true for the non-linear syntax in brms which is even less intuitive than the linear syntax.

Yes, that looks correct to me.

In this particular case, the linear predictor only contains the intercept, so estiamtes of b1_Intercept are actually directly estimates of b1 (that would not be true if the formulas for b1 and b2 were more complex). There is however a minor catch: since the Estimate column (the posterior mean) is computed separately for each parameter, it can happen that despite the estimate being somewhat representative of the posterior for each single parameter, the pair of estimates may not be a good representation of the join posterior of the two parameters (because the posterior of the parameters can have some correlation structure).

So the safe way is to always work with samples - pick a posterior sample, take the values of b1_Intercept and b2_Intercept, plug them in the equation, repeat - this will give you samples of the fitted curves that will be representative.

The simplest way is to use posterior_predict, posterior_epred or posterior_linpred to let brms calculate the predictions for you (which will do exactly this logic and automatically handle even complex linear predictors).

Using the lognormal family means that your predictor (b1 * x ^ b2) is a predictor for the logarithm of the observed values. So to make assumed “noise-free” predictions you would need to compute exp(b1 * x ^ b2) for each sample. If you are interested in the posterior mean of the predictions, you would need to take into account also the sigma parameter (the mean of log-normal is not the “log-mean”, but rather \exp(\mu + \sigma/2) - see Log-normal distribution - Wikipedia (this is what posterior_epred would do for you)

Hope that clarifies more than confuses!

(this stuff can be hard, so feel free to ask for additional clarifications)

1 Like

Thanks for taking the time to write this great explanation. I now have a better understanding of what is happening. I am still a bit confused about the execution of finding these parameters, and would be extremely grateful if you could further clarify this process.

In the current example, as you described, the estimates from for b1 and b2 are pretty much direct estimates since they are intercepts only. So using posterior_epred() (hopefully correctly), I can get the posterior mean (?) for the estimate using:

probs <- tibble(colMeans(posterior_epred( formula.fit.1.log , nlpar = "b1")))
apply(probs, 2, mean)

If however, the supplied formula is more complex, say for instance:

prior1 <- prior(normal(0, 10), nlpar = "b1", lb = 0) + 
  prior(normal(0, 1), nlpar = "b2", lb = 0) 

complex.power <- brm(bf(y ~ b1^b2, 
                                  b1 ~ 1 + x,
                                  b2 ~ 1,
                                  nl = TRUE),
                               data = newdata, 
                               prior = prior1, 
                               iter = 5000, 
                               warmup = 2000,
                               thin = 5, 
                               cores = 4,
                               family = lognormal(),
                               sample_prior = "yes",
                               save_all_pars = T, 
                               seed = 1234
)

giving:

 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: y ~ b1^b2 
         b1 ~ 1 + x
         b2 ~ 1
   Data: newdata (Number of observations: 2619) 
Samples: 4 chains, each with iter = 5000; warmup = 2000; thin = 5;
         total post-warmup samples = 2400

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
b1_Intercept     6.61      0.81     5.08     8.24 1.00     1741     1787
b1_x                     0.98      0.42     0.42     2.07 1.00     1738     1974
b2_Intercept     0.57      0.06     0.46     0.70 1.00     1727     1942

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.56      0.01     0.54     0.57 1.00     2156     2105

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Is it possible to estimate the value for the separate b1 estimates (“Intercept” and “x”)? I can only seem to find an option ( nlpar) allowing the estimation for the entire parameter, which does not appear to describe the model well when substituted back into the formula, so I’m assuming I have missed something here in its use.

1 Like

Sorry, I dropped the ball on this.

Looks mostly correct to me (though I didn’t run the code). But I think you are once again fighting against brms and your model might be more easily and transparently expressed in pure Stan.

I think a sensible approach might be to not use posterior_epred at all and just access the parameters from the underlying stan fit - you may even take advantage of the new rvars format from the posterior package, which should make all the calculations much simpler, e.g.

fit <- brm( ..... )
samples <- posterior::as_draws_rvars(fit$fit)

Additionally, it might be useful to inspect the generated Stan code from brms (via make_stancode) - it is a bit of a mess, but it still could help you better understand what is going on.

1 Like

Fantastic, thanks for clarifying that for me, I will certainly do as you suggest. Again, I really appreciate you taking the time to elaborate on this, it makes so much difference in terms of understanding and accessibility to these extremely valuable tools for novices such as myself.

Cheers!

1 Like