Putting priors on transformed parameters


In a lot of situations it’s easier to put priors on parameter transformations than on the immediate model parameters. When one does so, one is always greeted with the Stan warning that one should include, “a target += statement with the log absolute determinant of the Jacobian of the transform.” Now I’ve read on other threads that this particular warning is sometimes warranted and sometimes not. My question is: when the warning is warranted, just HOW exactly does one augment the += statement? What does one write exactly? Here’s my example using logistic regression.

The following generates some dummy data in R where I’ve got one continuous (age) and one categorical (group) variable.

n <- 10; beta0 <- -1.6; beta1 <- 0.03; beta2 <- 1
age <- runif(n=n, min=18, max=60)
x2 <- runif(n=n, min=0, max=1)
pi <- exp(beta0 + beta1*age + beta2*x2) / (1 + exp(beta0 + beta1*age + beta2*x2))
y <- rbinom(n=n, size=1, prob=pi)
group <- as.integer(cut(x2, breaks=3))
mylist <- list(age=age,group=group,y=y)
mylist$N <- length(y)
mylist$N_group <- 3

And here’s the Stan code where I’m trying to model the Bernoulli outcome y. Note that I’m putting a somewhat loose prior on the immediate group parameter b but I’m putting rather strong priors on the three group proportions defined in the transformed parameters block. That is, I have a hard time thinking in terms of prior beliefs about parameters expressed inside a logistic transformation, let alone ones in concert with other variables inside a logistic model. It’s much easier to elicit priors regarding the summary proportions or shares. Bless Stan for the handy transformed parameters block that lets me do just that:

stan.code ='
  int<lower=1> N;
  int<lower=1> N_group;
  int<lower=0, upper=1> y[N];
  int group[N];
  real<lower=0, upper=100> age[N];
  real a;
  vector[N_group] b;
  real intercept;

transformed parameters {
  real<lower = 0, upper = 1> Group1Share;
  real<lower = 0, upper = 1> Group2Share;
  real<lower = 0, upper = 1> Group3Share;
  Group1Share = inv_logit(intercept + b[1]);
  Group2Share = inv_logit(intercept + b[2]);
  Group3Share = inv_logit(intercept + b[3]);

  vector[N] p;
  intercept ~ normal(0, 2);
  b ~ normal(0, 5);
  a ~ normal(0, 2);
  Group1Share ~ normal(.333, .1);
  Group2Share ~ normal(.333, .1);
  Group3Share ~ normal(.333, .1);
  for (i in 1:N) {
    y[i] ~ bernoulli_logit(intercept + a*age[i] + b[group[i]]);

Now I can fit this model and examine the results with:

model = stan(model_code=stan.code, data=mylist, iter=2000, thin=10, chains=4)
print(model, digits = 3)

Where Stan gives me the warning I’d expect:

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement

Now in my case, I suspect the warning is warranted. (Am I right?) Here’s the result of a pairs plot:

It’s clear in the panels towards the top-left that the three immediate group parameters (b1, b2, b3) are almost perfectly inversely correlated with the intercept parameter. I may not be saying this quite right, but it appears that there’s some underdetermination going on. There are an infinite number of combinations wherein these 4 parameters can produce the same 3 prior share beliefs. I think this is what the Stan warning is trying to tell me. And according to this deeper warning explanation, I may not be sampling from the posterior density that I expect. By contrast, the pairs plot for the three b-parameters doesn’t look at all like the above if I drop the transformed parameters altogether.

But then again, the three resulting transformed parameter estimates that come from this model do exactly what I expect. They’re all around .33 and reflect my prior beliefs. All the other parameters make entirely intuitive sense. And the traceplot shows no cause for alarm. If it weren’t for the strong parameter correlations, I’d think that the model is just fine.

So, my questions are… First, is my understanding indeed correct? Is the Stan warning in this case valid and am I not sampling the density I expect? Do I need to append my += target in some way to properly sample from the posterior density reflecting my prior beliefs about the group proportions? Second, if so, what exactly should I append? For example, if I re-write my model block to use the the following form (which right now produces the same results as above)…

  for (i in 1:N) {
    lps[i] = bernoulli_logit_lpmf(y[i] | (intercept + a*age[i] + b[group[i]]));
  target += sum(lps);

…Then what’s the syntax I should be adding to the target += statement above? Would I need to add the sum of the log derivatives of my transformed parameters?

Thanks in advance.


The right-hand side of the target += must be the logarithm of the absolute value of the determinant of the Jacobian matrix of the mapping from the parameters to the transformed parameters. In your case, the Jacobian matrix is diagonal, so you can write it the sum of the logarithms of the absolute values of the derivatives of the mapping from each parameter to the corresponding transformed parameter.

However, you are making things more difficult on yourself by putting normal priors on the Group[1-3]Share transformed parameters, because the normal distribution puts mass outside the (0,1) interval that the Group[1-3]Share transformed parameters are constrained to lie on. You could truncate, but let’s assume instead that you put a beta prior on the Group[1-3]Share transformed parameters. Then, note that the derivative of the inv_logit transformation is the standard logistic PDF and its logarithm is implemented by the logistic_lpdf function in Stan. So, you would have

target += logistic_lpdf(intercept + b | 0, 1);

You should be able to work this out for yourself in R:

foo <- function(x, shape1, shape2) dbeta(plogis(x), shape1, shape2) * dlogis(x)
integrate(foo, lower = -Inf, upper = Inf, shape1 = sqrt(2), shape2 = pi) # 1.00000


Thank you tons, Ben. Great suggestion on using beta priors instead of normal. I can easily replace the priors for the above transformed parameters as follows:

  Group1Share ~ beta(50, 100);
  Group2Share ~ beta(50, 100);
  Group3Share ~ beta(50, 100);

As for the next part, I may muff this up but let me try to re-phrase what you wrote to make sure I understand. If I’ve got prior beliefs about transformations involving the direct model parameters (e.g., b1, b2, b3), then I need to map or re-express those beliefs in a new function involving only the direct model parameters. I need to take the logged derivative of this new function and add it to the target log density of my model. In my case, Stan’s logistic_lpdf (equivalent to dlogis(..., log = TRUE) in R) is the correct new function. You wrote foo and integrate() as tools to help with the mapping.

Assuming I’ve not totally misunderstood the above, I find that a logistic_lpdf with shape and scale parameters -.69649 and .10, respectively, yields share estimates closely centered around .33 and much like beta(50, 100) and ought to work. If so, does the following model statement now correctly sample what I’m intending?

  vector[N] lps;
  vector[N] p;
  intercept ~ normal(0, 2);
  b ~ normal(0, 5);
  a ~ normal(0, 2);
  Group1Share ~ beta(50, 100);
  Group2Share ~ beta(50, 100);
  Group3Share ~ beta(50, 100);
  for (i in 1:N) {
    lps[i] = bernoulli_logit_lpmf(y[i] | (intercept + a*age[i] + b[group[i]]));
  target += sum(lps) +
    logistic_lpdf(intercept + b | -.69649, .10);

The above at least runs well and generates intuitive parameters. In fact, it almost runs too well. The new parameters are all very similar to those from my original model on top and the lp__ value is only slightly higher (i.e., slightly less negative). It almost makes it seem like the correction doesn’t do very much. But then again, this is a toy example with just 10 observations and/or I may have misunderstood your explanation.

Thanks in advance for any more guidance.


If you are transforming intercept + b[i] with inv_logit — which is equivalent to logistic_cdf(intercept + b[i] | 0, 1), then your log-Jacobian is the sum of three terms involving the derivative of its logarithm, which is logistic_lpdf(intercept + b | 0, 1).

Also, a beta prior with shapes 50 and 100 have a mean of 1/3 but way less dispersion than your original normal priors with mean 1/3 and standard deviation 1/10. A beta with shapes of 7 and 14 would be a lot closer.

Also, you can do the likelihood in one line

target += bernoulli_logit_lpmf(y | intercept + a * age + b[group]);


I’m sorry if I’m sounding dense here, Ben, but just to be sure…

  1. When you wrote, logistic_lpdf(intercept + b | 0, 1) that’s just an example, right? One needs to replace the shape=0 and scale=1 parameters with values mapped from one’s prior transformed beliefs, yes? In my case I wound up with logistic_lpdf(intercept + b | -.69649, .10). I chose these values because my prior belief on inv_logit(intercept + b) is beta(50,100) which has a mean and sd of.333 and .04. In addition, the values .333 and .04 are (very roughly) the mean and sd of the derivative of inv_logit with shape and scale parameters -.69649 and .10. …Or have I got the logic behind this mapping idea and the Jacobian all goofed up?

  2. In the model block one needs to include in the target accumulator both the likelihood AND the above lpdf, correct? So, in my case my modeling block needs to include not just what you last wrote above but should have the two lines:

    target += bernoulli_logit_lpmf(y | (intercept + a*age + b[group]));
    target += logistic_lpdf(intercept + b | -.69649, .10);

Thanks in advance.


There’s a chapter in the manual with examples on how to deal with Jacobians for non-linear transforms and when you’ll need them.


Yes, what you’re doing is reasonable for mapping. You don’t need a Jacobian if you generate an unconstrained parameter than constrain it. It’s when you put a distribution on a transformed parameter that you’ll need the Jacobian.

You want to include both prior and likelihood if that’s what you’re asking. And any terms for Jacobian corrections (but it looks like you won’t need any going this way around).