Three-level nested linear model with lognormal likelihood

Hello all,

I’m trying to fit a three-level nested linear model. My data are confidential, so I can’t post them, but the response variable is continuous, strictly positive and heteroskedastic. There’s good reason to believe this response decreases linearly with some predictor, so I’d like to create a lognormal regression model.

I generated some data below to make a reproducible example. As an arbitrary example, say we’re measuring the time students spend on classwork each week over the course of a 15-week class, and modeling how this changes linearly with time (week). We collected data for 10 years, following 10 students each year. I’m assuming that the slope and mean intercept vary randomly across years, and that there is an individual-level deviation from the mean intercept. I’m attaching a plot of the first four years of simulated data to give you an idea of what that looks like.

I followed these two examples when writing my model. When I use a normal likelihood, the model runs fine and seems to recover the correct parameter values. However, when I use a lognormal likelihood, initialization fails with the message “Initialization between (-2, 2) failed after 100 attempts.” So, I think there is something I am not understanding about how the lognormal model is implemented. Any suggestions or resources would be appreciated!

Data generation


nyears<-10 #Data from 10 years

nsubj<-10 #10 individuals sampled per year

nweeks <- 15 #15 observations per individual each year

b0_mean <- 20 #Mean main intercept across years

b0_sd <- 3 #SD main intercept across years

b1_mean <- 0.5 #Mean slope across years

b1_sd <- 0.15 #SD slope across years

subj_sd <- 2.1 #SD of individual-level intercept

pop_sd <- 1.18 #Observation level SD

#Generate slopes and intercepts for each year
b0_k <- rnorm(nyears, mean = b0_mean, sd = b0_sd)

b1_k <- rnorm(nyears, mean = b1_mean, sd = b1_sd)

#Generate individual-level intercepts each year 
b0_jk <- rnorm(nyears*nsubj, mean = rep(b0_k, each = nsubj), sd = subj_sd)

b0_ijk <- rep(b0_jk, each = nweeks) #The intercept for each observation (for each year, individual, week)

#Generate the slope for each observation
b1_ijk <- rep(b1_k, each = nweeks*nsubj)

#Generate predictor - say week in class
week_ijk <- rep(rep(c(1:nweeks), nsubj), nyears)

range(b0_ijk - b1_ijk * week_ijk) #Make sure response is >0

response_ijk <- rlnorm(nweeks*nyears*nsubj, meanlog = log(b0_ijk - b1_ijk * week_ijk), sdlog = log(pop_sd))

subj_ijk <- rep(c(1:(nsubj*nyears)), each = nweeks) #The individual index

week_ijk <- rep(rep(c(1:nweeks), nsubj), nyears)

year_ijk <- rep(c(1:nyears), each = nweeks*nsubj)

data_df <- data.frame(year = year_ijk, week = week_ijk, subj = subj_ijk, response = response_ijk)

#truepars_df<-distinct(data.frame(b0=b0_ijk, b1=b1_ijk, subj=subj_ijk, year=year_ijk))

year_for_subj <- unique(data_df[c("subj","year")])[,"year"]

dat       <- list(n_obs         = nrow(data_df),
                  n_subj        = length(unique(data_df$subj)),
                  n_years       = length(unique(data_df$year)),
                  subj          = subj_ijk,
                  year          = year_ijk,
                  year_for_subj = year_for_subj,
                  y_ijk         = response_ijk,
                  x_ijk         = week_ijk)

Stan code

data {
  // Define variables in data
  // Number of level-1 observations
  int<lower=0> n_obs;
  // Number of level-2 clusters
  int<lower=0> n_subj;
  // Number of level-3 clusters
  int<lower=0> n_years;

  // Cluster IDs
  int<lower=1, upper=n_subj> subj[n_obs];
  int<lower=1, upper=n_years> year[n_obs];

  // Level 3 look up vector for level 2
  int<lower=1, upper=n_years> year_for_subj[n_subj];

  // Continuous outcome
  vector[n_obs] y_ijk;

  vector[n_obs] x_ijk;

parameters {
  // Population intercept and slope
  real beta_0;
  real beta_1;

  // Level-1 errors
  real<lower=0> sigma_e0;

  // Level-2 random effect
  vector[n_subj] u_0jk;
  real<lower=0> sigma_u0jk;

  // Level-3 random effects
  vector[n_years] u_0k;
  real<lower=0> sigma_u0k;

  vector[n_years] u_1k;
  real<lower=0> sigma_u1k;

transformed parameters  {

  // Individual mean
   real mu[n_obs];

  // Varying intercepts
  vector[n_subj] beta_0jk;
  vector[n_years] beta_0k;

  // Varying slope
  vector[n_years] beta_1k;

  // Varying intercepts definition
  // Level-3 with centered parametrization
  beta_0k = beta_0 + u_0k * sigma_u0k;
  // Level-2
  beta_0jk = beta_0k[year_for_subj] + u_0jk * sigma_u0jk;

  // Varying slope definition
  beta_1k = beta_1 + u_1k * sigma_u1k;

  // Individual mean
  for (i in 1:n_obs) {
    mu[i] = beta_0jk[subj[i]] - beta_1k[year[i]] * x_ijk[i];


model {
  // Prior part of Bayesian inference

  beta_0 ~ normal(0, 100);
  beta_1 ~ normal(0,10);
  sigma_e0 ~ normal(0, 10);
  sigma_u0k ~ normal(0,10);
  sigma_u0jk ~ normal(0,10);

  sigma_u1k ~ normal(0,10);
  // Random effects distribution
  u_0k  ~ normal(0, 1);
  u_0jk ~ normal(0, 1);
  u_1k  ~ normal(0, 1);
  // Likelihood
  //y_ijk ~ normal(mu, sigma_e0);//Normal liklihood works
  y_ijk ~ lognormal(log(mu), log(sigma_e0)); //lognormal doesn't work


Running the model

fileName <- "./three level lognormal model noncentered.stan"
stan_code <- readChar(fileName,$size)

resStan_nc <- stan(model_code = stan_code, data = dat,
                chains = 3, iter = 3000, warmup = 500, thin = 10)

The lognormal is a bit flaky. Just log the outcome, and anti-log the posterior predictions at the end.

Thanks for the quick reply. Won’t log-transforming the response result in the relationship no longer being linear?

So does the canonical link function in a lognormal model. And the identity link probably does not make sense.

Try using simply y_ijk ~ lognormal(mu, sigma_e0).

Also, your priors are pretty wide for a log-normal. Not saying, that this is unreasonable, but you might want to look into that.

If you use log(y_ijk) ~ normal (...), you have to add the Jacobian correction to the target.

Try using simply y_ijk ~ lognormal(mu, sigma_e0) .

Thanks for the suggestion, Max. I gave this a try and it runs, but doesn’t do a great job at recovering my original parameter values. I was under the impression that Stan takes the lognormal parameters on the log scale (i.e. if I tell Stan y has a lognormal(mu, sigma) distribution, that means log(y) has a normal(mu, sigma) distribution).

If I try a super simple lognormal regression model using the following Stan and R code, it works fine when the likelihood is y ~ lognormal(log(mu), log(sigma)), but I again get poor parameter estimates when I use y ~ lognormal(mu, sigma).

Stan code:

data {
  int N;
  vector[N] x;
  vector[N] y;
parameters {
  real m;
  real b;
  real sigma;
transformed parameters {
  real mu[N];

  for (i in 1:N) {
     mu[i] = b - m *x [i];

model {
  y ~ lognormal(log(mu), log(sigma)); //works
  //y ~ lognormal(mu, sigma); //runs but seems wrong

R code:

#Simple lognormal linear regression
x <- runif(1000, min = 0, max=10) #1000 data points

m <- 0.75 #slope

b <- 9 #intercept

Sigma <- 1.49 #sd

y <- rlnorm(meanlog = log(b-m*x), sdlog = log(Sigma), length(x))


# now try to fit in stan

fileName <- "simple lognormal regression.stan"
stan_code <- readChar(fileName,$size)
dat       <- list(N = length(x), 
                  x = x,
                  y = y)
res_sln <- stan(model_code = stan_code, data = dat,
                    chains = 2, iter = 2000, warmup = 200, thin = 10)

summary(res_sln, pars= c("m", "b", "sigma"))

For the y ~ lognormal(log(mu), log(sigma)) case I get

           mean      se_mean         sd      2.5%       25%       50%       75%    97.5%    n_eff
m     0.7650809 0.0009902037 0.01863989 0.7295261 0.7522724 0.7651809 0.7784772 0.798934 354.3544
b     9.0795265 0.0079684173 0.15040553 8.7797082 8.9788507 9.0846832 9.1872217 9.351898 356.2734
sigma 1.4630368 0.0006767977 0.01301990 1.4384374 1.4542604 1.4631127 1.4715226 1.488133 370.0813
m     0.9973385
b     0.9965439
sigma 1.0031996

but when using y ~ lognormal(mu, sigma) I get

           mean      se_mean          sd      2.5%       25%       50%       75%     97.5%    n_eff
m     0.1691531 0.0002635120 0.004455228 0.1596228 0.1662472 0.1692565 0.1717903 0.1777185 285.8505
b     2.3933957 0.0014531002 0.027240715 2.3384950 2.3760314 2.3926197 2.4122949 2.4463156 351.4357
sigma 0.3963783 0.0005368194 0.008724509 0.3780090 0.3915594 0.3957946 0.4019649 0.4146111 264.1348
m     0.9973603
b     1.0000772
sigma 1.0055670

AFAIK the meanlog and sdlog of rlnorm are defined on the log scale already, i.e. the defaults of meanlog = 0 and sdlog = 1 imply that \log y \sim \text{Normal}(0,1).

Did you really intend the parameters being on on a log-log scale? Judging from the example you gave in the first post, it doesn’t seem that way. So just leaving out the log() in meanlog and sdlog, so that you have something like meanlog = mu and sdlog = sigma, should give you the correct data generating process, and the specification I provided in my earlier post should work.

If you DID intend the log-log thing, then the bad new is that this is probably just very hard to fit…

I thought it was interesting to model exp(mu) rather than mu (as in the code above), because parameters would be on the median scale, am I right? The same thing with sd, implying the estimation of geometric standard deviation.

But I guess this way of doing imply constraints, and I would force mu to be positive in the code above. And logically, sigma > 1, but I don’t know if such hard constraints would make sampling easier.

Edit: typo

Well, if you have \log y \sim \text{Normal}(\mu,\sigma), then \exp(\mu) will be the median of y. So, if you model y ~ lognormal(mu, sigma) in Stan, then exp(mu) will be the median of y. You can also compute these things in the generated quantities block.

I definitely did not mean to log-log them! Which says to me there’s still something I don’t understand about implementing lognormal linear regression. I think part of my confusion stems from the fact that in the actual data I’m trying to model, there’s good reason to believe there’s a linear relationship between x and y. Thus, I generated my example data using meanlog = log(b-m*x), sdlog = log(Sigma) to generate data that looks similar (like the right panel of the attached plot). I’m not sure what the correct statistical terminology is for this, but I found these slides that seem to describe a different approach to the same kind of lognormal linear regression model I tried to approximate here.

nonlinear and linear.pdf (74.1 KB)

Initially I was very confused by the log-normal, but I think it’s Wikipedia article is quite nice. The log-normal distribution itself is a bit tricky, therefore it almost always makes sense to fit the normal distribution to the log of the dependent data (and apply the Jacobian correction). This is what Stan does under the hood if you use the lognormal. There is also a brief passage on this in the Stan manual (in the section about Jacobian iirc).

Your plot kind of shows what’s up. The left specification is correct, but you should plot x and log(y) to see what Stan is fitting. You will see your hypothesised linear relationship there.