Lognormal model with only summary statistics

Sundar_Dorai-Raj · June 14, 2018, 4:56pm

I’m trying to fit a lognormal model (code below). This works just fine if I have a vector y. E.g. in R,

y <- exp(rnorm(1000))
stan.data <- list(y = y, n = length(y), alpha = 1, beta = 1)
stan.fit <- sampling(stanmodels::lognormal, data = stan.data)

where stanmodels::lognormal is defined as:

data {
  real<lower=0> alpha;
  real<lower=0> beta;
  int<lower=0> n;
  real<lower=0> y[n];
}

parameters {
  real mu;
  real<lower=0> sigmasq;
}

transformed parameters  {
  real<lower=0> sigma;
  sigma = sqrt(sigmasq);
}

model {
  sigmasq ~ inv_gamma(alpha, beta);
  mu ~ normal(0, 1000);
  y ~ lognormal(mu, sigma);
}

However, in my actual problem, I don’t have y. Instead I have mean(y), sd(y), and length(y). Is it possible to still use Stan to achieve the same fit as if I had y? I’m a novice to Stan, and I’ve looked through many examples and the official documentation, but can’t seem to find anything that mentions this problem.

Thanks in advance!

–sundar

bgoodri · June 14, 2018, 5:45pm

It is basically a missing data problem with constraints given by the sample mean and the sample standard deviation.

Sundar_Dorai-Raj · June 14, 2018, 5:55pm

Thanks for the reply! Should I simulate the data using these constraints? Then use my Stan code on the simulated data? Or is there a way to modify the Stan code to do this for me?

bgoodri · June 14, 2018, 6:32pm

I would declare the values in the parameters block to be a simplex of size n. Then use your sample mean and sample standard deviation to transform the simplex into something that can be described by a lognormal density. Make sure to adjust by the log of the absolute value of the determinant of the Jacobian of the transformation though.

Bob_Carpenter · June 15, 2018, 9:52pm

If you can get the mean and sd of log y rather than of y, this would be easy. Otherwise, the non-linearity makes the constraint in the missing data problem tricky to code whether you try to impute log y or y from sd(y) and mean(y) and n, where y > 0.

Also, those wide priors are not helping matters.

Sundar_Dorai-Raj · June 18, 2018, 5:47pm

Actually, for my actual problem, I have a choice of either mean(y) or mean(log(y)). Same for sd(log(y)). The value of y is guaranteed to be >0. This means I can use:

y <- exp(rnorm(1000))
stan.data <- list(logmean = mean(log(y)), logsd = sd(log(y)), n = length(n))

But I’m still unsure how to set up the model statement. I’m using Stan for the first time and still don’t have a full grasp of the language. Thank you for all your replies thus far!

–sundar

bgoodri · June 18, 2018, 6:33pm

Having mean(log(y)) and sd(log(y)) is more useful than having mean(y) and sd(y), but I have a hard time imagining how the former is possible and yet the underlying data are unavailable. In that case, the likelihood can be written as

which when you take the logarithm and write it in Stan comes out as

target += -0.5 * n * log(2 * pi() * sigma_squared)
        - 0.5 * (n - 1) / sigma_squared * s_squared
        - 0.5 * n / sigma_squared * square(theta - xbar);

Topic		Replies	Views
Reparamterization in Stan Modeling specification	3	354	April 29, 2021
Lognormal with additive effects on the original scale Modeling techniques , specification , loo	4	955	September 4, 2017
Help with non-linear model [Log probability evaluates to log(0)] Modeling rstan	2	275	October 18, 2022
Reparameterising lognormal to make sampling easier Modeling	2	323	May 5, 2020
Hurdle lognormal distribution Modeling mixture	11	1938	October 9, 2020

Lognormal model with only summary statistics

Related Topics