Repeated Measures Regression Fitting

Hello, I am very new to Stan so I hope this question is appropriate for this forum.

Main Question: How might I adjust the model below to capture the data I’ve generated?

I have a repeated measures problem whereby each discrete level between 1:15 has 100 repeated measurements. I’m hoping to do a simple regression except the noise on each level is log-normally distributed. The data is generated with code below and is plotted:

x <- rep(1:15, each=100)
y <- 1 + 2*x + rlnorm(length(x),mean=0,sd=1)
plot(x, y)


Things I’ve Tried

  • I have tried the following snippet but I don’t believe this captures the log-normally distributed noise (which I believe would show up in the sampling of the intercept – that is, \alpha \sim \text{lognormal} or perhaps \sigma \sim \text{lognormal})
data {
  int<lower=0> N;  // Number of observations
  int<lower=0> K;
  matrix[N, K] X;  // Predictor matrix
  vector[N] y;     // Response variable
}

parameters {
  real alpha; 
  vector[K] beta;          // Coefficients
  real<lower=0> sigma;     // Standard deviation of the normal distribution
}

model {
  // Linear predictor
  vector[N] mu;

  mu = alpha + X * beta;

  // Likelihood
  y ~ lognormal(mu, sigma);
}

Running the model looked something like this:

test_data <- data.frame(x, y)
data <- list(
  N = nrow(test_data),
  K = ncol(test_data) - 1,
  X = as.matrix(test_data[,1]),  # Predictor variables
  y = test_data$y                # Response variable
)
stan_model <- stan_model(model_code = stan_model_code)
# Fit the model
fit <- sampling(stan_model, data = data, iter = 5000, chains = 4)
  • Adding a lognormal prior on \alpha but this had major sampling issues with many divergences.
  • Treating the 15 separate levels as random effects using (stan_lmer) but this didn’t seem to fit well for me either.

Final Thoughts

  • I know this is quite a simple modeling question but it would help me learn to understand how to adjust this model and keep learning beyond this.
  • I’m imagining many parallel lines that are along the general trend of the data but the distance between the lines for log-normally distributed.

Thanks for any insights! Glad to be part of the group (this is my first post).

Welcome to the Stan forums. As long as it’s about Stan it’s appropriate!

The trick is to write a Stan model that mirrors the data-generating process. The data generation you wrote down is the following.

x <- rep(1:15, each=100)
y <- 1 + 2*x + rlnorm(length(x),mean=0,sd=1)
plot(x, y)

This doesn’t match the model you wrote down, which is a standard lognormal regression. You can fix things so they match one of two ways.

  1. Use the following standard lognormal regression to generate data
x <- rep(1:15, each=100)
y <- rlnorm(length(x), mean=log(1 + 2 * x), sd=1)

The second line here is equivalent to the following:

y <- exp(rnorm(length(x), mean= 1 + 2 * x, sd = 1))

The result is error that’s proportional to the value of 1 + 2 * x.

  1. Or if you really did mean the first thing you wrote down, which is very unusual, you can code this in Stan using the likelihood
  y - (alpha + beta * x) ~ lognormal(0, sigma);

You won’t need a Jacobian adjustment because the transform is linear because x is data. You have to be careful with this approach to make sure the left-hand side remains positive.

Then you would hopefully recover alpha = 1, beta=2 and sigma=1.

In linear regressions, you can make the move you made without affecting the result. That is, we can write

y ~ normal(alpha + beta * x, sigma);

or equivalently

y - (alpha + beta * x) ~ normal(0, sigma);

The non-linearity of lognormal breaks the relationship.

1 Like

Thank you for the warm welcome, Bob! I learned so much just from this small example. I’ve also been working through your paper on Bayesian Workflow! I’d love to learn how to apply Bayesian analysis on complex systems modeling.

As for the example, I made the proposed change below. I appreciate the help!

...
model {
  intercept ~ normal(0, 5); // Prior for log of intercept
  slope ~ normal(0, 5);         // Prior for slope
  sigma ~ exponential(0.1);         // Prior for sigma
  // Likelihood
  y - (intercept + slope * x) ~ lognormal(0, sigma);
} 
generated quantities {
  array[N] real y_rep;
  for (n in 1 : N) {
    y_rep[n] = intercept + slope * x[n] + lognormal_rng(0, sigma);
  }
}
...
          mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
intercept 0.99       0 0.02 0.94 0.97 0.99 1.01  1.03  3210    1
slope     2.00       0 0.00 1.99 2.00 2.00 2.00  2.01  3598    1
sigma     0.95       0 0.03 0.89 0.93 0.95 0.98  1.02  3579    1