A problem I am studying has a bivariate latent variable whose prior is uniformly distributed on a regular hexagon. Challenges to obtaining a satisfactory sampling formulation for this variable have led me to discover some behavior that I find odd and confusing, and would like some assistance/advice on.
Background:
It is very challenging to come up with a sampling scheme for this variable that lacks discontinuity in both the formulation of the prior density and its first derivative. After several approaches, I came up with the following.
-
The hexagon is centered at the origin with a pair of opposite sides which are each parallel to the x axis, with sides distance 2 apart. Therefore that pair of sides intersect the y axis at (0, 1) and (0, -1) respectively. The bivariate density in this region will be uniformly 1 / (2 \sqrt{3}) so total probability is unity.
-
Find the marginal density of y by integrating out the x variable. This density turns out to have a discontinuous derivative at y=0.
- However, its CDF should be one order smoother. I calculated that and inverted it to get its quantile function.
This is a piecewise function whose conditional depends on the random variable U_Y. However, the function and its first derivative are continuous over its whole range. Therefore the coordinate y of this variable can ostensibly be sampled in stan by an inverse sampling: drawing from a standard uniform random variable U_Y, then plugging into the above to get a sample y.
- The conditional density of x|y turns out to be a complicated expression but which is, conditional on y, a uniform distribution in x. But its domain depends on y. Therefore I expect difficulty when sampling in stan.
- However, one can supposedly sample x by drawing from a \mathrm{Uniform}(-1/2, 1/2) random variable U_X then multiplying the resulting u_x by 2 (y+2) / \sqrt{3} if y \le 0, or 2 (2 - y) / \sqrt{3} if y > 0. (The 2s in the numerators come from the max and min of the uniform only being \pm 1/2.) The sampling of U_X should be straightforward. However, we note the scaling multiplication does not have a first derivative when y=0.
While not perfect with respect to smooth derivatives etc., it seems pretty close. One point I like is that the only sampling statements needed have been constructed to add only a constant (actually zero) to the log likelihood, which is desirable in this case of these latent variables. This could only happen otherwise if we used actual lpdf functions for both y and x|y, so they would telescope to the constant density of the region. But clearly these functions are not suitable for this.
Now for the actual issues I noticed!
The stan code to perform this sampling is as follows. Notice it is stripped down as far as I could; in particular, there are no range declarations on any of the parameters or transformed parameters:
// HexUnif.stan
data {} // data
parameters {
real uY; // latent uniform RV to plug into inverse cdf of y
real uX; // centered standard uniform to be scaled to x
} // parameters
transformed parameters {
real y;
real x;
// draw from inverted y CDF
if(uY <= 0.5) y = sqrt(1.0 + 6.0 * uY) - 2.0; else
y = 2.0 - sqrt(7.0 - 6.0 * uY);
// multiply centered standard uniform RV to final range of x
if(y <= 0.0) x = uX * (y + 2.0) / sqrt(3.0); else
x = uX * (2.0 - y) / sqrt(3.0);
} // transformed parameters
model {
// both uniforms have domain of length 1 so should not contribute to target()
target += uniform_lpdf(uY | 0.0, 1.0);
target += uniform_lpdf(uX | -0.5, 0.5);
} // model
And run as follows (using cmdstanr in R using RStudio)
HexUnif.mod <- cmdstan_model("stan/HexUnif.stan", compile = FALSE)
# HexUnif.mod[['check_syntax']]()
HexUnif.model <- HexUnif.mod[['compile']](pedantic = TRUE)
HexUnif.MCMC <- local({
set.seed(1406686775)
inits <- {function() list(uY = rbeta(1L, shape1=20.0, shape2=20.0),
uX = rbeta(1L, shape1=20.0, shape2=20.0) - 0.5) }
out <- {HexUnif.model[['sample']](
data = list(),
seed = 426361962,
refresh = 1000,
init = inits,
output_dir = "<localdir>/HexUnif",
sig_figs = 8,
chains = 10L,
parallel_chains = 1L,
iter_warmup = 200,
iter_sampling = 2000,
max_treedepth = 10,
adapt_delta = 0.80 ) } # out
out
})
[Output deleted] This had many problemsā¦
> HexUnif.MCMC[['cmdstan_diagnose']]()
Processing csv files: <local files>
Checking sampler transitions treedepth.
880 of 20000 (4.40%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.
Checking sampler transitions for divergences.
19120 of 20000 (95.60%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete.
OK not unexpected. Letās see about the log likelihood
> HexUnif.MCMC[['summary']]()
# A tibble: 5 Ć 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lp__ 0 0 0 0 0 0 NA NA NA
2 uY 0.489 0.482 0.288 0.370 0.0481 0.949 1.01 1374. 2103.
3 uX -0.00146 -0.00851 0.291 0.378 -0.446 0.457 1.01 1223. 2080.
4 y -0.0195 -0.0269 0.527 0.620 -0.865 0.857 1.01 1374. 2103.
5 x -0.00158 -0.00726 0.265 0.327 -0.413 0.422 1.01 1372. 3196.
Identically zero. So far so good. Well not really as this sampling is very poor and untrustworthy.
Yet that is not the first attempt I made to run this sampling.
I originally included range declarations on all variables [note: following also observed if range declarations omitted for transformed parameters]. This is the stan code:
// HexUnifLimits.stan
data {} // data
parameters {
real<lower= 0.0, upper=1.0> uY; // latent uniform RV to plug into inverse cdf of y
real<lower=-0.5, upper=0.5> uX; // centered standard uniform to be scaled to x
} // parameters
transformed parameters {
real<lower=-1.0 , upper=1.0 > y;
real<lower=-2.0/sqrt(3.0), upper=2.0/sqrt(3.0)> x;
// draw from inverted y CDF
if(uY <= 0.5) y = sqrt(1.0 + 6.0 * uY) - 2.0; else
y = 2.0 - sqrt(7.0 - 6.0 * uY);
// multiply centered standard uniform RV to final range of x
if(y <= 0.0) x = uX * (y + 2.0) / sqrt(3.0); else
x = uX * (2.0 - y) / sqrt(3.0);
} // transformed parameters
model {
// both uniforms have domain of length 1 so should not contribute to target()
target += uniform_lpdf(uY | 0.0, 1.0);
target += uniform_lpdf(uX | -0.5, 0.5);
} // model
The identical R code as above was used to run this, other than the names, local output directory, and random seeds. Here are the diagnostics:
> HexUnifLimits.MCMC[['cmdstan_diagnose']]()
Processing csv files: <local csvs>
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
So no apparent problems, at least yet. Here is the fit summary:
> HexUnifLimits.MCMC[['summary']]()
# A tibble: 5 Ć 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lp__ -3.99 -3.64 1.19 0.892 -6.39 -2.84 1.00 7630. 9569.
2 uY 0.503 0.504 0.290 0.370 0.0462 0.954 1.00 14041. 10540.
3 uX 0.000391 -0.000591 0.286 0.364 -0.446 0.448 1.00 13894. 11255.
4 y 0.00519 0.00655 0.530 0.619 -0.870 0.871 1.00 14041. 10540.
5 x 0.0000206 -0.000428 0.261 0.316 -0.416 0.416 1.00 14788. 13064.
As we saw from the diagnostics, the ESS stats look OK, and rhats are unity up to rounding. But the log likelihood is now accumulating something, most certainly not zero.
Also, the ACF plots [not shown] show huge autocorrelation in the draws, with a AC āhalf lifeā of about lag 5. But this is not totally unexpected due to the scaling multiplication of the x variable having discontinuous derivative. I might expect the other program was worse (not checked).
Both of these methods seem to sample \{x, y\} correctly, albeit not always with great efficiency. Here are my questions:
-
Why does the program without range declaration limits give such terrible convergence, much much worse than the one with limits?
-
Why does the program with range declaration limits accumulate log likelihood? The only two sampling statements for both programs are uniform lpdfs, both whose distributionsā domain have length 1.
-
Lastly, a more general question which others might find helpful: is there any material online etc. regarding best practices, when inverse sampling a variable whose prior has a discontinuous derivative?
I appreciate being able to somewhat reasonably sample from these somewhat sketchy priors. But now I worry that they will add spurious log likelihood into the full modeling problem, and contaminate my results. Any help/clarity appreciated.
Thanks for taking the time to read - JZ
RStudio 2024.04.2+764 āChocolate Cosmosā Release (e4392fc9ddc21961fd1d0efd47484b43f07a4177, 2024-06-05) for macOS
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) RStudio/2024.04.2+764 Chrome/120.0.6099.291 Electron/28.3.1 Safari/537.36, Quarto 1.5.56 (/Applications/quarto/bin/quarto)
> version
_
platform x86_64-apple-darwin20
arch x86_64
os darwin20
system x86_64, darwin20
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14)
nickname Race for Your Life
> packageVersion("cmdstanr")
[1] ā0.8.1.9000ā