Imagine I ask someone to allocate probability mass to discrete bins of realizations of a variable. For example, I ask them what’s the probability that X \in (-1, 1], that X \in (1, 3], and so on. Imagine I also ask them what is the mean outcome that they expect.
Let us simulate this scenario. A person thinks the correct distribution is a standard normal, so when I ask them to allocate prob to the following bins: (-\infty, -1], (-1, 1], (1, +\infty), they will give me the probabilities: 0.159 0.683 0.159 (r c(pnorm(-1), pnorm(1) - pnorm(-1), 1 - pnorm(1))
). They also give me a point estimate for the mean, 0. I now want to infer what plausible values of \sigma would lead to those probability bins.
A way I was thinking of writing this likelihood function is:
L = F(\text{UB} | \text{Point estimate}, \sigma)^\text{C} \cdot (1-F(\text{UB} | \text{Point estimate}, \sigma))^{(1-\text{C})}
where F is the CDF of the normal distribution, UB is the upper bound of the interval (in the example: c(-1, 1, Inf); the last bin would be ignored as it is redudant and to avoid numerical issues); Point estimate is provided by the person; and C is the cumulative proability of the bin (0.159 0.841 1.000; r pnorm(c(-1, 1, Inf))
)
The solution to the maximum likelihood problem is straightforward. We need a \sigma such that F(\text{UB} | \text{Point estimate}, \sigma) = \text{C}. Running stan optimize (see below) leads to a very quick solution: \sigma = 1 for this example.
However, when running HMC the sampler struggles hard, exploring all the way to infinity. I know I am not using priors, but even when I do so, the algorithm explores parameters which are clearly unreasonable for the problem.
Below I show the code running this example. In the first approach, no prior is introduced. Notice in the second I introduce a prior just to constraint the most extreme outcomes but it’s still not enough to stop the sampler from going very far into unreasonable values. For example, sigma having the 75th percentile at ~13. If sigma was 13, the cumulative probability of the bins would look very different (0.469 0.531 1.000 instead of 0.159 0.841 1.000). In addition, as you can see in the chart, when I plot sigma against lp__, the maximum is clearly not close to 1, the solution to the maximum likelihood problem solved with r rstan::optimizing
.
So I am either doing something very wrong with the modelling and there is a better way to do this, or there is a numerical issue I am not aware. Do you have any ideas?
Thank you for your time.
library(rstan)
#> Loading required package: StanHeaders
#>
#> rstan version 2.32.3 (Stan version 2.26.1)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
data_to_pass_stan <-
list(
N = 2,
point_estimate = c(0, 0),
upper_bounds = c(-1, 1),
cumulative_prob = pnorm(c(-1, 1))
)
example_stan_dist <- rstan::stan_model("infer-distribution-simple.stan")
print(example_stan_dist)
#> S4 class stanmodel 'anon_model' coded as follows:
#> data {
#> int<lower=0> N;
#> vector[N] point_estimate;
#> vector[N] upper_bounds;
#> vector[N] cumulative_prob;
#> }
#> parameters {
#> real<lower=0> sigma;
#> }
#> model {
#> for (n in 1:N) {
#> target += (cumulative_prob[n]) * normal_lcdf(upper_bounds[n] | point_estimate[n], sigma) +
#> (1-cumulative_prob[n]) * normal_lccdf(upper_bounds[n] | point_estimate[n], sigma);
#> }
#> }
rstan::optimizing(example_stan_dist, data = data_to_pass_stan)
#> $par
#> sigma
#> 0.9999986
#>
#> $value
#> [1] -0.8748665
#>
#> $return_code
#> [1] 0
#>
#> $theta_tilde
#> sigma
#> [1,] 0.9999986
samples <-
rstan::sampling(example_stan_dist, data = data_to_pass_stan)
#> Warning: There were 2027 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
samples
#> Inference for Stan model: anon_model.
#> 4 chains, each with iter=2000; warmup=1000; thin=1;
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#>
#> mean se_mean sd 2.5% 25% 50%
#> sigma 9.300592e+307 NaN Inf 7.87289e+306 5.173977e+307 9.403159e+307
#> lp__ 7.074900e+02 0.05 0.86 7.05270e+02 7.071500e+02 7.077500e+02
#> 75% 97.5% n_eff Rhat
#> sigma 1.349518e+308 1.75899e+308 NaN NaN
#> lp__ 7.081100e+02 7.08370e+02 320 1.01
#>
#> Samples were drawn using NUTS(diag_e) at Fri Nov 10 10:04:25 2023.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at
#> convergence, Rhat=1).
example_stan_dist_prior <- rstan::stan_model("infer-distribution-simple-prior.stan")
print(example_stan_dist_prior)
#> S4 class stanmodel 'anon_model' coded as follows:
#> data {
#> int<lower=0> N;
#> vector[N] point_estimate;
#> vector[N] upper_bounds;
#> vector[N] cumulative_prob;
#> }
#> parameters {
#> real<lower=0> sigma;
#> }
#> model {
#> sigma ~ exponential(.1);
#> for (n in 1:N) {
#> target += (cumulative_prob[n]) * normal_lcdf(upper_bounds[n] | point_estimate[n], sigma) +
#> (1-cumulative_prob[n]) * normal_lccdf(upper_bounds[n] | point_estimate[n], sigma);
#> }
#> }
rstan::optimizing(example_stan_dist_prior, data = data_to_pass_stan)
#> $par
#> sigma
#> 0.9114252
#>
#> $value
#> [1] -0.9700811
#>
#> $return_code
#> [1] 0
#>
#> $theta_tilde
#> sigma
#> [1,] 0.9114252
samples_with_prior <-
rstan::sampling(example_stan_dist_prior, data = data_to_pass_stan)
#> Warning: There were 31 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
samples_with_prior
#> Inference for Stan model: anon_model.
#> 4 chains, each with iter=2000; warmup=1000; thin=1;
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#>
#> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
#> sigma 9.17 0.27 9.27 0.59 2.57 6.00 12.97 35.10 1205 1
#> lp__ -0.41 0.01 0.56 -1.97 -0.57 -0.21 -0.03 0.02 1753 1
#>
#> Samples were drawn using NUTS(diag_e) at Fri Nov 10 10:04:26 2023.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at
#> convergence, Rhat=1).
library(tidybayes)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
samples_with_prior %>%
spread_draws(lp__, sigma) %>%
ggplot(aes(x = sigma, y = lp__)) +
geom_point() +
scale_x_log10()
Created on 2023-11-10 with reprex v2.0.2