However, that results in the distributions P(x) = 2x, P(y) = 2(1-y), not what was intended. I get the intended distribution by adding y ~ uniform(0, x); to the model block (or by defining an unscaled y with constant boundaries and then scaling this by x). I conclude that the uniform prior needs to be stated explicitly for parameters whose bounds depend on other parameters, unlike for parameters whose bounds are fixed at the beginning of execution (whose priors are correctly understood implicitly, e.g. x here). I donāt understand whatās happening when I donāt specify the uniform explicitly, nor why this is different for parameters with constant or variable bounds. Is this documented somewhere? I couldnāt find it. I guess itās something to do with the Jacobian but couldnāt say what.
(Though the above explanation should be enough, Iāve elaborated on this issue, including plots, here.)
This doesnāt directly address what is happening to the distribution of x. I am not sure myself.
But some time ago I had a problem where a parameter bound was a function of other parameters (and thus, implicitly, so was its density), and was having sampling issues. I tried standardizing that parameter to not depend on the other, then generated another in the transformed parameters block to then account for this dependence. It improved my sampling.
In your case, I would try making both x and y in [0, 1], then assign their product to a transformed parameter. I also suspect that the distribution of x will be what you want.
Thanks, yes I described exactly that approach as an alternative solution to adding y ~ uniform(0, x); at the link in the original post (though not in the post itself). Iām wondering whether the default behaviour, i.e. what happens when neither solution is used, is explained somewhere in the docs. If not, perhaps it should be as itās a bit of a gotcha
I think the problem stems from a misunderstanding about the āimplicitā uniform prior. The fact is that Stan never directly imposes any implicit prior.
Recall that Stan needs the posterior density only up to a constant multiplier (and hence constant additive constant for the log density). So when you have x ~ uniform(0,1); or target += uniform_lpdf(x | 0,1);, it is equivalent to target += 0, which can be ignored (the term wonāt be 0 if the bounds are different, but if the bounds are constant, the density it will still be constant and so you can ignore it).
However the dependent uniform prior target += uniform_lpdf(y | 0, x) is equivalent to target += -log(x) which obviously is not a constant and hence cannot be ignored.
More generally it is always good to remember that Stan program is really just a function to compute posterior density - nothing more, nothing less. So although it may sometimes feel that Stan does some sort of magic, it actually never does any magic.
The implicit prior is 2-dimensional uniform on the triangle (0,0), (1,0), (1,1). Those marginal distributions are cross-sections of the triangle and I assume you estimated them from histograms; a scatter plot will have uniform appearance.
Iām having a moment - the histograms in the post are certainly estimates of the marginals of the āuniform over the triangleā, but Iām missing how the model as specified results in that distribution. The model has both lower/upper specified, so Iām assuming it uses inv-logit as a transform, but I canāt get that to match my own, probably flawed, log posterior function. If you have a moment, can you see if I missing something obvious?:
library(ggplot2)
res <- array(dim = c(1e5, 2))
# can match the histograms from the blog post
for (ii in seq_len(nrow(res))) {
x <- sqrt(runif(n = 1)) # but whyyyy sqrt when jacobian is exp(y)
y <- runif(n = 1, min = 0, max = x)
res[ii, ] <- c(x, y)
}
# evidently uniform
res |>
as_tibble(.name_repair = \(x) c("x", "y")) |>
ggplot() +
geom_hex(aes(x = x, y = y), bins = 57)
withr::with_par(new = list(mfrow = c(1, 2)), {
hist(res[, 1])
hist(res[, 2])
})
logit <- function(x) {
log(x / (1 - x))
}
inv_logit <- function(v) {
1 / (1 + exp(-v))
}
log_post_2 <- function(x, y) {
inv_logit_x <- inv_logit(x)
inv_logit_y <- inv_logit(y)
res <- dunif(inv_logit_x, min = 0, max = 1, log = TRUE)
res <- res + log(inv_logit_x) + log(1 - inv_logit_x)
res <- res + dunif(x = x * inv_logit_y, min = 0, max = x, log = TRUE)
res <- res + log(x) + log(inv_logit_y) + log(1 - inv_logit_y)
return(res)
}
plot_grid <- tidyr::expand_grid(
x = seq(from = 0, to = 1, length.out = 100),
y = seq(from = 0, to = 1, length.out = 100)
) |>
filter(y < x) |>
mutate(lp2 = log_post_2(x, y))
# this should be uniform, but it isn't and I'm missing something.
ggplot(plot_grid) +
geom_point(aes(x = x, y = y, colour = lp2))
where we know f_X = 1 and the limits of integration.
Letās generate these marginals without the conditional distribution. Integrating the pdf for Y gives F_Y = y - y \ln(y) which is a nasty thing to try to do inverse transform sampling. However, our friend, wolfram alpha, easily dispatches with that and shows us that we can use the lambertW function (which is actually built-in to Stan as well). The code in R for that is
library(lamW)
for (ii in seq_len(nrow(res))) {
x <- runif(n = 1)
y <- exp(1 + lambertWm1(-runif(n = 1, min = 0, max = 1) / exp(1)))
res[ii, ] <- c(x, y)
}
This is illuminating (as ever) and I fully get that they are the correct (intended?) marginals.
My confusion is twofold in the other direction:
Why is sqrt the ācorrectā transformation for x in the naive Monte Carlo sampler to match the āuniform over the triangleā distribution implemented by the Stan program in the OP? Iām 99% sure the answer is āthe distribution of a transformation of a random variable formulaā, but Iām not seeing the sqrt yielding transformation (more coffee for me maybe).
What am I missing in log_post_2 from either log-Jacobian that would make the final plot uniform over y < x ?