Lognormal regression and linear regression on log-transformed variable lead to different results

flo · February 27, 2023, 1:29pm

I’m analyzing response time data. I was expecting that using a lognormal regression should be equivalent to applying a log-transformation on the outcome variable first and then using linear regression. While I also get mostly identical results with both approaches, the group-level effect of the main predictor variable predictor varies substantially. Can anyone explain me how this can happen?

m1 <- brm(RT ~ 1 + predictor + session + (1 + predictor + session |subject), data=df, iter=6000, family=lognormal())

m2 <- brm(log(RT) ~ 1 + predictor + session + (1 + predictor + session |subject), data=df, iter=6000)

> summary(m1)
 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: RT ~ 1 + predictor + session + (1 + predictor + session | subject) 
   Data: df (Number of observations: 16293) 
  Draws: 4 chains, each with iter = 3000; warmup = 0; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~subject (Number of levels: 90) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                      0.42      0.03     0.36     0.49 1.00     1232     2137
sd(predictor)                      0.02      0.01     0.00     0.03 1.00      902     3235
sd(session2)                       0.24      0.02     0.20     0.28 1.00     4096     7285
cor(Intercept,predictor)          -0.22      0.33    -0.83     0.53 1.00    13474     5731
cor(Intercept,session2)           -0.33      0.10    -0.51    -0.14 1.00     5050     6463
cor(predictor,session2)           -0.09      0.35    -0.78     0.64 1.02      236      265

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           9.77      0.05     9.68     9.86 1.01      483      692
predictor          -0.03      0.01    -0.04    -0.02 1.00    14287     9032
session2           -0.20      0.03    -0.25    -0.15 1.00     2900     6087

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.49      0.00     0.49     0.50 1.00    22156     7907

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

> summary(m2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(RT) ~ 1 + predictor + session + (1 + predictor + session | subject) 
   Data: df (Number of observations: 16293) 
  Draws: 4 chains, each with iter = 3000; warmup = 0; thin = 1;
         total post-warmup draws = 12000

Group-Level Effects: 
~subject (Number of levels: 90) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                      0.42      0.03     0.36     0.49 1.00     1384     3430
sd(predictor)                      0.02      0.01     0.00     0.04 1.00     1146     1911
sd(session2)                       0.24      0.02     0.20     0.28 1.00     5117     7220
cor(Intercept,predictor)          -0.06      0.32    -0.70     0.64 1.00    11868     5376
cor(Intercept,session2)           -0.33      0.10    -0.51    -0.13 1.00     4826     7512
cor(predictor,session2)           -0.26      0.33    -0.84     0.49 1.02      246      337

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           9.76      0.04     9.68     9.85 1.01      761     1604
predictor          -0.01      0.01    -0.02     0.00 1.00    13244     8814
session2           -0.20      0.03    -0.25    -0.14 1.00     4314     6346

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.50      0.00     0.49     0.50 1.00    20260     9036

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Operating System: Windows
brms Version: 2.16.3

LucC · February 27, 2023, 5:58pm

Curious indeed! I also see that the estimated correlations of the random effect are quite different. But also that the effective sample size for those correlations are relatively small compared to the other parameters (and differently so for the two models). I wonder why that is.
@paul.buerkner

LucC · February 27, 2023, 6:02pm

To figure this out, a reproducible minimal working example would be great!

jd_c · February 27, 2023, 7:06pm

@LucC @flo
Maybe something along these lines. However, I couldn’t see any difference in the two models.

#setup data
n <- 1000
n_id <- 100
id <- rep(1:n_id, each=n/n_id)
x <- rnorm(n, 0, 1)

#correlated intercepts and slopes
sigma_a <- 0.5
sigma_b <- 0.1
rho <- (-0.3)
mu <- c(0,0)
sds <- c(sigma_a, sigma_b)
Rho <- matrix( c(1,rho,rho,1), nrow=2)
Sigma <- diag(sds) %*% Rho %*% diag(sds)

library(MASS)
vary_effects <- mvrnorm(n_id, mu, Sigma)

z_a <- vary_effects[,1]
z_b <- vary_effects[,2]

#generative outcome
a <- 0.5
b <- 0.25
meanlog <- (a + z_a[id]) + (z_b[id] + b)*x
sdlog <- 0.25
y <- rlnorm(n=n, meanlog=meanlog, sdlog=sdlog)
d1 <- cbind.data.frame(y, x, id)
d1$id <- factor(d1$id)

#compare models
library(brms)
m1 <- brm(y ~ 1 + x + (1 + x|id), family=lognormal, data=d1, cores=4)
m2 <- brm(log(y) ~ 1 + x + (1 + x|id), family=gaussian, data=d1, cores=4)
m1
m2

LucC · February 27, 2023, 8:32pm

Thanks, this is indeed a reproducible example of something, but unfortunately, it doesn’t reproduce your problem. Is there a way for you to be able to share (a sample of) the data with which the curious behaviour can be reproced?

jd_c · February 27, 2023, 8:43pm

The OP would have to do that. I was simply trying to see if a simple sim example of lognormal data with varying slopes and intercepts would reproduce their problems (it doesn’t) and provide some code that might be modified more closely to the OP’s problem to reproduce it in a minimal working example.

andre.pfeuffer · February 28, 2023, 4:51am

It’s because the lognormal distribution has the mean \exp(\mu + \sigma^2 / 2)
If you take the log, you will still have the “sigma”-term. You can follow the advise and
more details here:
https://stats.stackexchange.com/questions/236577/how-can-i-fit-the-parameters-of-a-lognormal-distribution-knowing-the-sample-mean

LucC · February 28, 2023, 10:59am

I’m not sure this explains what is going on. In both cases, the OP is modelling the geometric mean, not the arithmetic mean that you are referring to. See also the example minimum working example by @jd_c above. There, no discrepancies arise. So something unusual is going on with the OP’S data or data management.

flo · February 28, 2023, 12:10pm

Thanks for your input everyone.

Upon creating a reproducible example, I realized that both models seem to have worked on slightly different data sets… Not sure where this comes from, but that should explain the divergent results. Sorry for bothering everyone!

Topic		Replies	Views
Reaction time Gaussian and Lognormal distribution two different results General techniques , fitting-issues , specification , brms	2	718	February 10, 2022
Lognormal regression and moment matching brms ecology	4	2956	April 10, 2023
Do I need to transform my response variable to fit a lognormal GLM in brms? brms specification , brms	1	1561	April 12, 2023
Lognormal brms model: looking correlations between its re-parametrized estimates (median and SD)? Modeling	2	672	December 7, 2020
Non-linear power relationship model under a lognormal distribution - realtionship to parameters Modeling techniques , specification , brms	4	1149	September 20, 2021

Lognormal regression and linear regression on log-transformed variable lead to different results

Related topics