I have n data points (sorry I cannot share data) which on log10 scale is assumed normally distributed with mean as log10(mu) and measurement error as normally distributed with mean 0 and standard deviation of 1.
So for each data point the model reads as
log10(y_i) = log10(mu_i) + error
Is the following sampling statement for this model correct?
data{
y[n];
}
model {
y ~ lognormal(log10(mu), 1);
}
Or would it be like :
model {
y ~ lognormal(log(mu), 1);
}
Generating data from the two sampling statements gave me very different results, so just checking which statement is correct
Thanks @stemangiola.
I am still confused. y~ lognormal(mean, sigma) is the distribution with mean and sigma on the log_e scale right? So using it to model data on log10 scale okay? Would it affect the estimated parameters? Or Would it be more appropriate to use the following
data{
y[n];
}
model {
log10(y) ~normal(log10(mu), 1);
}
And if using lognormal(mean, sigma) to model the data described above, the mean and sigma should be on log_e scale right? that is mean = log_e(mu) ?
Nope. Let \mu = E[X]. Then E[\ln X] \neq \ln(E[X]) = \log(\mu).
This:
stems from the fact that \ln x = \ln(10)\log_{10}(x). So if \log_{10} X is normally distributed, then \ln X will also be (and vice-versa), since the relationship is linear – linear transformations of Gaussian random variables are again Gaussian.
I recommend you check out the Wikipedia page on the log-normal to understand the moments of the distribution. Its mean and standard deviation do not coincide with \mu and \sigma.
thanks @maxbiostat.
Reading through the reference I agree that if the data is normally distributed on the log_10 scale and so is on the log_e scale. What confused me is whether the mean for lognormal() distribution should be taken as log10(mu_i) or log(mu_i). I guess I did not clarify the model in my original post. The mean mu_i in the model:
log10(y_i) = log10(mu_i) +error
is a function of parameters. Simulating data using both lognormal(log10(mu_i), 1) and lognormal(log(mu_i),1), produced two datasets different range. The first resembled observed data on log_10 scale, and the latter resembled observed data on log_e scale. So was wondering, if the sampling statement is lognormal(), then should I be using log(mu_i) instead of log10(mu_i) as the mean.