Happy New Year to everyone.
I have a question about how to write the Non-centered parametrization of lognormal distribution in STAN.
Suppose I have a variable, u ~ lognormal(mu, sigma_u), where mu and sigma_u both > 0 -----> (1)
I also understand that this can also be written as exp(u) ~ normal (mu_normal, sigma_u_normal) —> (2)
So my question is how can write Eq. (1) and (2) as non-centered parameterization in STAN?
For Eq(1), I am not sure, but can I write is -> mu + u_raw*sigma_u? with both mu and u_raw having lognormal distribution priors? I am not sure about it.
And for Eq(2). the non-centered version may be written as exp(mu_normal) + exp(u_raw)*sigma_u_normal ? with normal priors for mu_normal and u_raw?
Any guidance will be highly appreciated.
In Stan (note it’s a name, not an acronym, so the last letters don’t need to be capitalized, and often folks don’t bother even with the first letter), you can achieve selection of centered/non-centered parameterization of a lognormal prior via:
// K: number of identifiable-units-of-observation (IUOO)
int K ;
// N: number of observations total (must be at least one per IUOO)
// which_K: index associating each observation with its IUOO
int<lower=1,upper=K> which_K[N] ;
// Y: vector of observations
vector[N] Y ;
// centered: binary toggle for intercept centered/non-centered parameterization
int<lower=0,upper=1> centered ;
// mu: mean (across-IUOOs) of X
real mu ;
// sigma: sd (across-IUOOs) of X
real<lower=0> sigma ;
// X: mean (across observations) for each IUOO
offset = (centered ? 0 : mu)
, multiplier = (centered ? 1 : sigma)
>[K] X ;
mu ~ std_normal() ; //must be changed to reflect domain expertise
sigma ~ std_normal() ; //must be changed to reflect domain expertise
//hierarchical *log-normal* prior for X:
X ~ lognormal( mu, sigma ) ;
Y ~ normal( X[which_K], 1 ) ;
That is, you don’t really have to do anything special relative to the normal case other than express X as
lognormal rather than
Thanks Mike for both the correction with name and suggestion on non-centered parameterization.
Thanks for this thread and reply - most useful.
Could you actually explain the behavior of using
<offest = xx, multiplier=xx>?
In the case of a lognormally distributed variable, the offset and multiplier would be applied on the log scale, I assume?
It’s a bit confusing, since you can’t see the
Also - loosing
<lower=0> on your lognormally distributed parameter in order to use offset and multiplier won’t case the sampler to suggest negative values for the parameter?
Ah, you should think of the
y ~ lognormal() expression as being expanded behind the scenes to
log(y) ~ normal() and not any kind of expression involving
y ~ exp(normal()).
This is important (and this is a point that it took me forever to realize/understand) because the
y ~ distribution(...) is actually not expressing a structural relationship in the sense that you can exponentiate the right hand side, but instead a shorthand for
target += distribution_lupdf( y | ...), which in turn is expressing that we want to compute the log-probability-density (log of the height of the curve) of at each value in y given the distribution and it’s parameters. Then whatever value that yields (if y is a vector, then the individual log-pdfs are computed then summed) is added to a special variable called
target that reflects, once all the prior log-pdfs and likelihood log-pdfs are added to it, the posterior probability of the model for a given configuration of parameter values and the data (almost; to get that you have to use the
_lpdf functions rather than the
_lupdf, but these will differ by a constant for a given dataset, so since we’re usually only interested in the relative posterior probability across different parameter values for a given dataset, that constant can be left off for faster compute and no harm). Stan’s HMC-based sampler will then attempt to explore different configurations of parameter values, obtaining for each the sum-log-posterior-probability value, and do fancy beyond-me stuff to explore parameter values in a way that the history of visited values itself eventually serves as samples from the full posterior probability distribution.
Most users (myself included!) see the
~ syntax and assume it’s expressing a sampling event of some sort, and for simpler models this isn’t such a big deal, but with more complicated stuff involving things like transformations, mixtures, etc, it can be important to have a good grasp on the fact that it’s a shorthand for
target += .
I really should redo my “Bayes by hand” lecture to illustrate these points (ex. using Stan-like
target += syntax in R to do the computations for the single-parameter case after showing it by hand).