Lognormal model with only summary statistics

Sundar_Dorai-Raj · June 14, 2018, 4:56pm

I’m trying to fit a lognormal model (code below). This works just fine if I have a vector y. E.g. in R,

y <- exp(rnorm(1000))
stan.data <- list(y = y, n = length(y), alpha = 1, beta = 1)
stan.fit <- sampling(stanmodels::lognormal, data = stan.data)

where stanmodels::lognormal is defined as:

data {
  real<lower=0> alpha;
  real<lower=0> beta;
  int<lower=0> n;
  real<lower=0> y[n];
}

parameters {
  real mu;
  real<lower=0> sigmasq;
}

transformed parameters  {
  real<lower=0> sigma;
  sigma = sqrt(sigmasq);
}

model {
  sigmasq ~ inv_gamma(alpha, beta);
  mu ~ normal(0, 1000);
  y ~ lognormal(mu, sigma);
}

However, in my actual problem, I don’t have y. Instead I have mean(y), sd(y), and length(y). Is it possible to still use Stan to achieve the same fit as if I had y? I’m a novice to Stan, and I’ve looked through many examples and the official documentation, but can’t seem to find anything that mentions this problem.

Thanks in advance!

–sundar

bgoodri · June 14, 2018, 5:45pm

It is basically a missing data problem with constraints given by the sample mean and the sample standard deviation.

Sundar_Dorai-Raj · June 14, 2018, 5:55pm

Thanks for the reply! Should I simulate the data using these constraints? Then use my Stan code on the simulated data? Or is there a way to modify the Stan code to do this for me?

bgoodri · June 14, 2018, 6:32pm

I would declare the values in the parameters block to be a simplex of size n. Then use your sample mean and sample standard deviation to transform the simplex into something that can be described by a lognormal density. Make sure to adjust by the log of the absolute value of the determinant of the Jacobian of the transformation though.

Bob_Carpenter · June 15, 2018, 9:52pm

If you can get the mean and sd of log y rather than of y, this would be easy. Otherwise, the non-linearity makes the constraint in the missing data problem tricky to code whether you try to impute log y or y from sd(y) and mean(y) and n, where y > 0.

Also, those wide priors are not helping matters.

Sundar_Dorai-Raj · June 18, 2018, 5:47pm

Actually, for my actual problem, I have a choice of either mean(y) or mean(log(y)). Same for sd(log(y)). The value of y is guaranteed to be >0. This means I can use:

y <- exp(rnorm(1000))
stan.data <- list(logmean = mean(log(y)), logsd = sd(log(y)), n = length(n))

But I’m still unsure how to set up the model statement. I’m using Stan for the first time and still don’t have a full grasp of the language. Thank you for all your replies thus far!

–sundar

bgoodri · June 18, 2018, 6:33pm

Having mean(log(y)) and sd(log(y)) is more useful than having mean(y) and sd(y), but I have a hard time imagining how the former is possible and yet the underlying data are unavailable. In that case, the likelihood can be written as

which when you take the logarithm and write it in Stan comes out as

target += -0.5 * n * log(2 * pi() * sigma_squared)
        - 0.5 * (n - 1) / sigma_squared * s_squared
        - 0.5 * n / sigma_squared * square(theta - xbar);

Topic		Replies	Views
How to specify the mean and variance of a lognormal in a vectorized fashion? Modeling techniques	7	455	April 29, 2021
Three-level nested linear model with lognormal likelihood Modeling	10	1321	January 31, 2019
Log-normal parameter: can I use the mean in stan object? Modeling rstan	8	216	October 10, 2025
Repeated Measures Regression Fitting Modeling techniques	2	151	August 16, 2024
Priors on imputed values for Lognormal censored nonlinear model cause divergence Modeling techniques , fitting-issues , specification	3	516	May 8, 2023

Lognormal model with only summary statistics

Related topics