Parameter bounds that depend on other parameters

ChrisWymant · August 19, 2024, 4:44pm

To specify the priors x \sim \text{Uniform}(0,1) and y|x \sim \text{Uniform}(0,x), I would have assumed this would work:

parameters {
  real<lower = 0, upper = 1> x;
  real<lower = 0, upper = x> y;
}
model {}

However, that results in the distributions P(x) = 2x, P(y) = 2(1-y), not what was intended. I get the intended distribution by adding y ~ uniform(0, x); to the model block (or by defining an unscaled y with constant boundaries and then scaling this by x). I conclude that the uniform prior needs to be stated explicitly for parameters whose bounds depend on other parameters, unlike for parameters whose bounds are fixed at the beginning of execution (whose priors are correctly understood implicitly, e.g. x here). I don’t understand what’s happening when I don’t specify the uniform explicitly, nor why this is different for parameters with constant or variable bounds. Is this documented somewhere? I couldn’t find it. I guess it’s something to do with the Jacobian but couldn’t say what.
(Though the above explanation should be enough, I’ve elaborated on this issue, including plots, here.)

JohnnyZoom4H · August 20, 2024, 2:14pm

This doesn’t directly address what is happening to the distribution of x. I am not sure myself.

But some time ago I had a problem where a parameter bound was a function of other parameters (and thus, implicitly, so was its density), and was having sampling issues. I tried standardizing that parameter to not depend on the other, then generated another in the transformed parameters block to then account for this dependence. It improved my sampling.

In your case, I would try making both x and y in [0, 1], then assign their product to a transformed parameter. I also suspect that the distribution of x will be what you want.

Hope this helps.

ChrisWymant · August 20, 2024, 2:39pm

Thanks, yes I described exactly that approach as an alternative solution to adding y ~ uniform(0, x); at the link in the original post (though not in the post itself). I’m wondering whether the default behaviour, i.e. what happens when neither solution is used, is explained somewhere in the docs. If not, perhaps it should be as it’s a bit of a gotcha

martinmodrak · August 20, 2024, 6:37pm

I think the problem stems from a misunderstanding about the “implicit” uniform prior. The fact is that Stan never directly imposes any implicit prior.

Recall that Stan needs the posterior density only up to a constant multiplier (and hence constant additive constant for the log density). So when you have x ~ uniform(0,1); or target += uniform_lpdf(x | 0,1);, it is equivalent to target += 0, which can be ignored (the term won’t be 0 if the bounds are different, but if the bounds are constant, the density it will still be constant and so you can ignore it).

However the dependent uniform prior target += uniform_lpdf(y | 0, x) is equivalent to target += -log(x) which obviously is not a constant and hence cannot be ignored.

More generally it is always good to remember that Stan program is really just a function to compute posterior density - nothing more, nothing less. So although it may sometimes feel that Stan does some sort of magic, it actually never does any magic.

Hope that clarifies more than confuses.

nhuurre · August 21, 2024, 6:26am

The implicit prior is 2-dimensional uniform on the triangle (0,0), (1,0), (1,1). Those marginal distributions are cross-sections of the triangle and I assume you estimated them from histograms; a scatter plot will have uniform appearance.

hhau · August 21, 2024, 12:28pm

I’m having a moment - the histograms in the post are certainly estimates of the marginals of the “uniform over the triangle”, but I’m missing how the model as specified results in that distribution. The model has both lower/upper specified, so I’m assuming it uses inv-logit as a transform, but I can’t get that to match my own, probably flawed, log posterior function. If you have a moment, can you see if I missing something obvious?:

library(ggplot2)

res <- array(dim = c(1e5, 2))

# can match the histograms from the blog post
for (ii in seq_len(nrow(res))) {
  x <- sqrt(runif(n = 1)) # but whyyyy sqrt when jacobian is exp(y)
  y <- runif(n = 1, min = 0, max = x)
  res[ii, ] <- c(x, y)
}

# evidently uniform
res |> 
  as_tibble(.name_repair = \(x) c("x", "y")) |> 
  ggplot() +
  geom_hex(aes(x = x, y = y), bins = 57)

withr::with_par(new = list(mfrow = c(1, 2)), {
  hist(res[, 1])
  hist(res[, 2])
})

logit <- function(x) {
  log(x / (1 - x))
}

inv_logit <- function(v) {
  1 / (1 + exp(-v))
}

log_post_2 <- function(x, y) {
  inv_logit_x <- inv_logit(x)
  inv_logit_y <- inv_logit(y)
  
  res <- dunif(inv_logit_x, min = 0, max = 1, log = TRUE)
  res <- res + log(inv_logit_x) + log(1 - inv_logit_x)
  
  res <- res + dunif(x = x * inv_logit_y, min = 0, max = x, log = TRUE)
  res <- res + log(x) + log(inv_logit_y) + log(1 - inv_logit_y)
  return(res)
} 

plot_grid <- tidyr::expand_grid(
  x = seq(from = 0, to = 1, length.out = 100),
  y = seq(from = 0, to = 1, length.out = 100)
) |> 
  filter(y < x) |> 
  mutate(lp2 = log_post_2(x, y))

# this should be uniform, but it isn't and I'm missing something.
ggplot(plot_grid) +
  geom_point(aes(x = x, y = y, colour = lp2))

spinkney · August 21, 2024, 2:54pm

hhau:

# can match the histograms from the blog post
for (ii in seq_len(nrow(res))) {
  x <- sqrt(runif(n = 1)) # but whyyyy sqrt when jacobian is exp(y)
  y <- runif(n = 1, min = 0, max = x)
  res[ii, ] <- c(x, y)
}

When you don’t sqrt you get

This is the correct marginals and here’s why:

We have

X \sim \mathcal{U}(0, 1) \\ Y \mid X \sim \mathcal{U}(0, x)

To get the marginal of Y we have to integrate out X

f_{Y \mid X} (y \mid x) = \begin{cases} \frac{1}{x} \; \text{if } 0 \le y \le x \\ 0 \; \text{ otherwise} \end{cases} \\ \begin{align*} f_{Y} &= \int_0^1 f_{Y \mid X} (y \mid x) f_X(x) dx \\ f_{Y} &= \int_y^1 \frac{1}{x} dx \\ f_{Y} &= -\ln(y) \end{align*}

where we know f_X = 1 and the limits of integration.

Let’s generate these marginals without the conditional distribution. Integrating the pdf for Y gives F_Y = y - y \ln(y) which is a nasty thing to try to do inverse transform sampling. However, our friend, wolfram alpha, easily dispatches with that and shows us that we can use the lambertW function (which is actually built-in to Stan as well). The code in R for that is

library(lamW)
for (ii in seq_len(nrow(res))) {
  x <- runif(n = 1) 
  y <- exp(1 + lambertWm1(-runif(n = 1, min = 0, max = 1) / exp(1)))
  res[ii, ] <- c(x, y)
}

which matches the above marginal

Just for fun we can test this in Stan

parameters {
  real<lower=0, upper=1> x;
  real<lower=0, upper=1> y;
}
model {
 target += log(-log(y));
}

which gives

 variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
     lp__ -3.89  -3.55 1.14 0.81 -6.18 -2.79 1.00     1800     2205
     x     0.50   0.50 0.29 0.36  0.05  0.94 1.00     3092     2340
     y     0.25   0.19 0.22 0.20  0.01  0.70 1.00     2988     2316

hhau · August 21, 2024, 4:04pm

This is illuminating (as ever) and I fully get that they are the correct (intended?) marginals.
My confusion is twofold in the other direction:

Why is sqrt the ‘correct’ transformation for x in the naive Monte Carlo sampler to match the “uniform over the triangle” distribution implemented by the Stan program in the OP? I’m 99% sure the answer is “the distribution of a transformation of a random variable formula”, but I’m not seeing the sqrt yielding transformation (more coffee for me maybe).
What am I missing in log_post_2 from either log-Jacobian that would make the final plot uniform over y < x ?

spinkney · August 21, 2024, 4:48pm

This is because things cancel out. To get to this let’s assume that Z \sim \mathcal{U}(0, 1) and X = \sqrt{Z} and Y | X \sim \mathcal{U}(0, x).

The distribution of X is

\begin{align*} F_X(x) &= P(X \le x) = P(\sqrt{Z} \le x) = P(Z \le x^2) \\ F_X(x) &= \int_0^{x^2} dz = x^2 \\ f_X(x) &= \delta F_X(x) = 2x \end{align*}

The distribution of Y | X \sim \mathcal{U}(0, x) and we need the -log(x) because it depends on a variable.

Hence the target increment is

// target += log(2 * x) = log2() + log(x);
target += log2() + log(x);
target += -log(x);

since log2() is a constant then this the same as specifying a 0 which is what happens when we don’t do anything.

To make this complete. The marginal distribution of Y (integrating out X from P_{Y \mid X}(y \mid x)) here is what @ChrisWymant posted in his OP:

hhau · August 21, 2024, 5:43pm

How satisfying - thank you for making it explicit for me!

Topic		Replies	Views
Defining priors for dependent random variables in STAN Modeling rstan , specification	1	613	November 8, 2021
Model inputs as parameters of a distribution PyStan	14	1077	September 19, 2021
Sampling parameters recursively Modeling specification	11	136	October 31, 2024
Jacobian adjustment for parameter with parameter-based upper+lower bounds Modeling jacobian-adjustment	13	211	February 24, 2025
Absent prior draws? Modeling	5	896	January 16, 2018

Parameter bounds that depend on other parameters

Related topics