# Lognormal regression and linear regression on log-transformed variable lead to different results

I’m analyzing response time data. I was expecting that using a lognormal regression should be equivalent to applying a log-transformation on the outcome variable first and then using linear regression. While I also get mostly identical results with both approaches, the group-level effect of the main predictor variable predictor varies substantially. Can anyone explain me how this can happen?

m1 <- brm(RT ~ 1 + predictor + session + (1 + predictor + session |subject), data=df, iter=6000, family=lognormal())

m2 <- brm(log(RT) ~ 1 + predictor + session + (1 + predictor + session |subject), data=df, iter=6000)

> summary(m1)
Family: lognormal
Links: mu = identity; sigma = identity
Formula: RT ~ 1 + predictor + session + (1 + predictor + session | subject)
Data: df (Number of observations: 16293)
Draws: 4 chains, each with iter = 3000; warmup = 0; thin = 1;
total post-warmup draws = 12000

Group-Level Effects:
~subject (Number of levels: 90)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                      0.42      0.03     0.36     0.49 1.00     1232     2137
sd(predictor)                      0.02      0.01     0.00     0.03 1.00      902     3235
sd(session2)                       0.24      0.02     0.20     0.28 1.00     4096     7285
cor(Intercept,predictor)          -0.22      0.33    -0.83     0.53 1.00    13474     5731
cor(Intercept,session2)           -0.33      0.10    -0.51    -0.14 1.00     5050     6463
cor(predictor,session2)           -0.09      0.35    -0.78     0.64 1.02      236      265

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           9.77      0.05     9.68     9.86 1.01      483      692
predictor          -0.03      0.01    -0.04    -0.02 1.00    14287     9032
session2           -0.20      0.03    -0.25    -0.15 1.00     2900     6087

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.49      0.00     0.49     0.50 1.00    22156     7907

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

> summary(m2)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log(RT) ~ 1 + predictor + session + (1 + predictor + session | subject)
Data: df (Number of observations: 16293)
Draws: 4 chains, each with iter = 3000; warmup = 0; thin = 1;
total post-warmup draws = 12000

Group-Level Effects:
~subject (Number of levels: 90)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                      0.42      0.03     0.36     0.49 1.00     1384     3430
sd(predictor)                      0.02      0.01     0.00     0.04 1.00     1146     1911
sd(session2)                       0.24      0.02     0.20     0.28 1.00     5117     7220
cor(Intercept,predictor)          -0.06      0.32    -0.70     0.64 1.00    11868     5376
cor(Intercept,session2)           -0.33      0.10    -0.51    -0.13 1.00     4826     7512
cor(predictor,session2)           -0.26      0.33    -0.84     0.49 1.02      246      337

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           9.76      0.04     9.68     9.85 1.01      761     1604
predictor          -0.01      0.01    -0.02     0.00 1.00    13244     8814
session2           -0.20      0.03    -0.25    -0.14 1.00     4314     6346

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.50      0.00     0.49     0.50 1.00    20260     9036

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

• Operating System: Windows
• brms Version: 2.16.3

Curious indeed! I also see that the estimated correlations of the random effect are quite different. But also that the effective sample size for those correlations are relatively small compared to the other parameters (and differently so for the two models). I wonder why that is.
@paul.buerkner

To figure this out, a reproducible minimal working example would be great!

@LucC @flo
Maybe something along these lines. However, I couldn’t see any difference in the two models.

#setup data
n <- 1000
n_id <- 100
id <- rep(1:n_id, each=n/n_id)
x <- rnorm(n, 0, 1)

#correlated intercepts and slopes
sigma_a <- 0.5
sigma_b <- 0.1
rho <- (-0.3)
mu <- c(0,0)
sds <- c(sigma_a, sigma_b)
Rho <- matrix( c(1,rho,rho,1), nrow=2)
Sigma <- diag(sds) %*% Rho %*% diag(sds)

library(MASS)
vary_effects <- mvrnorm(n_id, mu, Sigma)

z_a <- vary_effects[,1]
z_b <- vary_effects[,2]

#generative outcome
a <- 0.5
b <- 0.25
meanlog <- (a + z_a[id]) + (z_b[id] + b)*x
sdlog <- 0.25
y <- rlnorm(n=n, meanlog=meanlog, sdlog=sdlog)
d1 <- cbind.data.frame(y, x, id)
d1$id <- factor(d1$id)

#compare models
library(brms)
m1 <- brm(y ~ 1 + x + (1 + x|id), family=lognormal, data=d1, cores=4)
m2 <- brm(log(y) ~ 1 + x + (1 + x|id), family=gaussian, data=d1, cores=4)
m1
m2


Thanks, this is indeed a reproducible example of something, but unfortunately, it doesn’t reproduce your problem. Is there a way for you to be able to share (a sample of) the data with which the curious behaviour can be reproced?

The OP would have to do that. I was simply trying to see if a simple sim example of lognormal data with varying slopes and intercepts would reproduce their problems (it doesn’t) and provide some code that might be modified more closely to the OP’s problem to reproduce it in a minimal working example.

It’s because the lognormal distribution has the mean \exp(\mu + \sigma^2 / 2)
If you take the log, you will still have the “sigma”-term. You can follow the advise and
more details here:
https://stats.stackexchange.com/questions/236577/how-can-i-fit-the-parameters-of-a-lognormal-distribution-knowing-the-sample-mean

I’m not sure this explains what is going on. In both cases, the OP is modelling the geometric mean, not the arithmetic mean that you are referring to. See also the example minimum working example by @jd_c above. There, no discrepancies arise. So something unusual is going on with the OP’S data or data management.