Lognormal model with only summary statistics

specification

#1

I’m trying to fit a lognormal model (code below). This works just fine if I have a vector y. E.g. in R,

y <- exp(rnorm(1000))
stan.data <- list(y = y, n = length(y), alpha = 1, beta = 1)
stan.fit <- sampling(stanmodels::lognormal, data = stan.data)

where stanmodels::lognormal is defined as:

data {
  real<lower=0> alpha;
  real<lower=0> beta;
  int<lower=0> n;
  real<lower=0> y[n];
}

parameters {
  real mu;
  real<lower=0> sigmasq;
}

transformed parameters  {
  real<lower=0> sigma;
  sigma = sqrt(sigmasq);
}

model {
  sigmasq ~ inv_gamma(alpha, beta);
  mu ~ normal(0, 1000);
  y ~ lognormal(mu, sigma);
}

However, in my actual problem, I don’t have y. Instead I have mean(y), sd(y), and length(y). Is it possible to still use Stan to achieve the same fit as if I had y? I’m a novice to Stan, and I’ve looked through many examples and the official documentation, but can’t seem to find anything that mentions this problem.

Thanks in advance!

–sundar


#2

It is basically a missing data problem with constraints given by the sample mean and the sample standard deviation.


#3

Thanks for the reply! Should I simulate the data using these constraints? Then use my Stan code on the simulated data? Or is there a way to modify the Stan code to do this for me?


#4

I would declare the values in the parameters block to be a simplex of size n. Then use your sample mean and sample standard deviation to transform the simplex into something that can be described by a lognormal density. Make sure to adjust by the log of the absolute value of the determinant of the Jacobian of the transformation though.


#5

If you can get the mean and sd of log y rather than of y, this would be easy. Otherwise, the non-linearity makes the constraint in the missing data problem tricky to code whether you try to impute log y or y from sd(y) and mean(y) and n, where y > 0.

Also, those wide priors are not helping matters.