Lognormal() sampling

#1

Hi,

I have n data points (sorry I cannot share data) which on log10 scale is assumed normally distributed with mean as log10(mu) and measurement error as normally distributed with mean 0 and standard deviation of 1.
So for each data point the model reads as

log10(y_i)  = log10(mu_i) + error

Is the following sampling statement for this model correct?

data{
y[n];
}
model {
    y ~ lognormal(log10(mu), 1); 
} 

Or would it be like :

model {
    y ~ lognormal(log(mu), 1); 
} 

Generating data from the two sampling statements gave me very different results, so just checking which statement is correct

Thanks

#2

If your data is normally distributed at log_10 it is also at log_e.

Therefore

data{
y[n];
}
model {
    y ~ lognormal(mu, 1); 
} 

1 Like
#3

Thanks @stemangiola.
I am still confused. y~ lognormal(mean, sigma) is the distribution with mean and sigma on the log_e scale right? So using it to model data on log10 scale okay? Would it affect the estimated parameters? Or Would it be more appropriate to use the following

data{
y[n];
}
model {
    log10(y) ~normal(log10(mu), 1); 
} 

And if using lognormal(mean, sigma) to model the data described above, the mean and sigma should be on log_e scale right? that is mean = log_e(mu) ?

Thanks

#4

Nope. Let \mu = E[X]. Then E[\ln X] \neq \ln(E[X]) = \log(\mu).

This:

stems from the fact that \ln x = \ln(10)\log_{10}(x). So if \log_{10} X is normally distributed, then \ln X will also be (and vice-versa), since the relationship is linear – linear transformations of Gaussian random variables are again Gaussian.

I recommend you check out the Wikipedia page on the log-normal to understand the moments of the distribution. Its mean and standard deviation do not coincide with \mu and \sigma.

1 Like
#5

thanks @maxbiostat.
Reading through the reference I agree that if the data is normally distributed on the log_10 scale and so is on the log_e scale. What confused me is whether the mean for lognormal() distribution should be taken as log10(mu_i) or log(mu_i). I guess I did not clarify the model in my original post. The mean mu_i in the model:

log10(y_i)  = log10(mu_i) +error

is a function of parameters. Simulating data using both lognormal(log10(mu_i), 1) and lognormal(log(mu_i),1), produced two datasets different range. The first resembled observed data on log_10 scale, and the latter resembled observed data on log_e scale. So was wondering, if the sampling statement is lognormal(), then should I be using log(mu_i) instead of log10(mu_i) as the mean.

Thanks