Non centered parametrization of lognormal Distribution

Hello, All,
Happy New Year to everyone.

I have a question about how to write the Non-centered parametrization of lognormal distribution in STAN.

Suppose I have a variable, u ~ lognormal(mu, sigma_u), where mu and sigma_u both > 0 -----> (1)

I also understand that this can also be written as exp(u) ~ normal (mu_normal, sigma_u_normal) —> (2)

So my question is how can write Eq. (1) and (2) as non-centered parameterization in STAN?

For Eq(1), I am not sure, but can I write is -> mu + u_raw*sigma_u? with both mu and u_raw having lognormal distribution priors? I am not sure about it.

And for Eq(2). the non-centered version may be written as exp(mu_normal) + exp(u_raw)*sigma_u_normal ? with normal priors for mu_normal and u_raw?

Any guidance will be highly appreciated.

cheers
Antony

1 Like

In Stan (note it’s a name, not an acronym, so the last letters don’t need to be capitalized, and often folks don’t bother even with the first letter), you can achieve selection of centered/non-centered parameterization of a lognormal prior via:

data {
  // K: number of identifiable-units-of-observation (IUOO)
  int K ;
  // N: number of observations total (must be at least one per IUOO)
  int<lower=K> N;
  // which_K: index associating each observation with its IUOO
  int<lower=1,upper=K> which_K[N] ;
  // Y: vector of observations
  vector[N] Y ;
  // centered: binary toggle for intercept centered/non-centered parameterization
  int<lower=0,upper=1> centered ;
}
parameters {
  // mu: mean (across-IUOOs) of X
  real mu ;
  // sigma: sd (across-IUOOs) of X
  real<lower=0> sigma ;
  // X: mean (across observations) for each IUOO
  vector<
    offset = (centered ? 0 : mu)
    , multiplier = (centered ? 1 : sigma)
  >[K] X ;
}
model {
  //hyper-priors
  mu ~ std_normal() ; //must be changed to reflect domain expertise
  sigma ~ std_normal() ; //must be changed to reflect domain expertise
  //hierarchical *log-normal* prior for X:
  X ~ lognormal( mu, sigma ) ;
  //likelihood:
  Y ~ normal( X[which_K], 1 ) ;
}

That is, you don’t really have to do anything special relative to the normal case other than express X as lognormal rather than normal

Thanks Mike for both the correction with name and suggestion on non-centered parameterization.

Cheers
Antony

1 Like

Thanks for this thread and reply - most useful.

Could you actually explain the behavior of using <offest = xx, multiplier=xx>?

In the case of a lognormally distributed variable, the offset and multiplier would be applied on the log scale, I assume?

It’s a bit confusing, since you can’t see the exp operation.

Also - loosing <lower=0> on your lognormally distributed parameter in order to use offset and multiplier won’t case the sampler to suggest negative values for the parameter?

Correct.

Ah, you should think of the y ~ lognormal() expression as being expanded behind the scenes to log(y) ~ normal() and not any kind of expression involving exp like y ~ exp(normal()).

This is important (and this is a point that it took me forever to realize/understand) because the y ~ distribution(...) is actually not expressing a structural relationship in the sense that you can exponentiate the right hand side, but instead a shorthand for target += distribution_lupdf( y | ...), which in turn is expressing that we want to compute the log-probability-density (log of the height of the curve) of at each value in y given the distribution and it’s parameters. Then whatever value that yields (if y is a vector, then the individual log-pdfs are computed then summed) is added to a special variable called target that reflects, once all the prior log-pdfs and likelihood log-pdfs are added to it, the posterior probability of the model for a given configuration of parameter values and the data (almost; to get that you have to use the _lpdf functions rather than the _lupdf, but these will differ by a constant for a given dataset, so since we’re usually only interested in the relative posterior probability across different parameter values for a given dataset, that constant can be left off for faster compute and no harm). Stan’s HMC-based sampler will then attempt to explore different configurations of parameter values, obtaining for each the sum-log-posterior-probability value, and do fancy beyond-me stuff to explore parameter values in a way that the history of visited values itself eventually serves as samples from the full posterior probability distribution.

Most users (myself included!) see the ~ syntax and assume it’s expressing a sampling event of some sort, and for simpler models this isn’t such a big deal, but with more complicated stuff involving things like transformations, mixtures, etc, it can be important to have a good grasp on the fact that it’s a shorthand for target += .

I really should redo my “Bayes by hand” lecture to illustrate these points (ex. using Stan-like target += syntax in R to do the computations for the single-parameter case after showing it by hand).

2 Likes