Optimizing() and sampling() over disjoint domain/ excluded point

Peshkin · October 8, 2017, 1:32pm

What is the right way to go about optimizing and/or sampling if my variable is
real<lower=0> mu;
except I want to enforce that mu is never exactly 1 i.e. exclude a point on
this line.
I can imagine running twice, once for mu \in (0,1), once for mu \in (0,Inf)
and then selecting a model with better lp_ output. But this looks clumsy
and if you are in multi-dimensional space gets out of hand.

-Leon

dlakelan · October 8, 2017, 2:59pm

To be clear, you want to sample, and then if mu is 0.999999999999931 you’re ok with it, and if it’s 1.00000000021 you’re ok, but if it’s 1.0 exactly as a floating point number you can’t accept this?

Anyway, if you really care about such things, simply sample, and then if you ever get exactly 1 throw away that sample… hint. you will never get exactly 1 so it won’t matter, but this method will work.

When it comes to optimizing, you could maybe get 1.0 exactly as the exact optimum, but if this occurs, what do you want to do about it?

More to the point, I think you should revisit your model and consider why it is that this single point is excluded and what it means to you to be 1 + epsilon for epsilon a tiny number.

Peshkin · October 8, 2017, 3:37pm

Yes correct interpretation.
The problem arises because 1 is inflection point,
I optimize Dirichlet / Beta function and for some data the optimizer fails because there is discontinuity of likelihood at 1.

Bob_Carpenter · October 9, 2017, 11:28pm

Hi, Leon! I’m not clear on what you’re trying to do here. Can you write down your likelihood and prior?

Do you mean beta distribution rather than beta function (the function is the normalizer for the distribution)? For the beta distribution, values fall in (0, 1), so values greater than 1 don’t make sense; parameters are only constrained to be positive.

Peshkin · October 13, 2017, 1:23pm

Here is one example, which is easy to repeat, I guess due to the properties of the uniform prior. If I keep alpha and beta positive many chains break on some data, if I constrain to alpha and beta over 1 (convex PDF), all works well. My reasoning is that at alpha=1 the PDF going from convex to concave creates an issue for sampling.
data {
int<lower=2> J; // number of coins
int<lower=0> y[J]; // heads in respective coin trials
int<lower=0> n[J]; // total in respective coin trials
}
parameters {
real<lower=0> alpha;
real<lower=0> beta;
}
transformed parameters {
real<lower=0, upper=1> mu;
mu = (alpha ) / (alpha + beta );
}
model {
alpha ~ uniform(0.,100);
beta ~ uniform(0.,100);
for (j in 1:J) {
y[j] ~ beta_binomial(n[j], alpha, beta);
}
}

Bob_Carpenter · October 14, 2017, 8:06pm

Yes, when you have beta(a, b) [included implicitly in beta_binomial(n, a, b)], you’re going to have issues with a < 1 and b < 1 because they have very heavy tails when transformed to the unconstrained scale (it’s a logit transform).

I often find alpha > 100 or beta > 100 because they’re like prior counts; for example, I’m over 100 in the repeated binary trial case study (on Stan web site). When you have a hard constraint at the boundary, we use a logit transform again, and if mass piles up near the boundary, it’s going to be near plus or minus infinity on the unconstrained scale, which can lead to instability.

If you really want a uniform(0, 100) prior, then you need to constrain alpha and beta to have <lower = 0, upper = 100>; otherwise, it’ll just reject above 100 and lead to inefficient sampling. We instead recommned no upper bound and a weakly informative prior (like a Pareto, exponential, or gamma or lognormal) that’s also unconstrained.

I also find it easier to reparameterize, also as described in the manual, and use the parameters alpha / (alpha + beta) which is the prior mean and (alpha + beta), which is like the overall precision or prior count.

You can write that beta-binomial more efficiently and reliably as

y ~ beta_binomial(n, alpha, beta);

Peshkin · October 14, 2017, 8:31pm

Here is re-parametrized model, and I run it like this

fitB = sampling(modK, data=c(“J”,“y”,“n”), chains = 10, iter = 4000, warmup = 3500, control = list(adapt_delta = .9), cores = 1, refresh=-1)

and get “398 divergent transitions after warmup”

By the way for some reason the warnings only show up when there are more than one chain, is that a bug or a feature ?

data {
     int<lower=2> J;          // number of coins
     int<lower=0> y[J];       // heads in respective coin trials
     int<lower=0> n[J];       // total in respective coin trials
}
parameters {
     real<lower = 0, upper = 1> mu;
     real<lower = 0> kappa;
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;

    alpha = kappa * mu;
    beta = kappa - alpha;
}
model {
    mu ~ uniform(0, 1);
    kappa ~ uniform(1,1000);
    y ~ beta_binomial(n, alpha, beta);
}

[edit: escaped code]

Peshkin · October 14, 2017, 8:32pm

Forgot to include a data instance

n
[1] 347 396 116 303 345
y
[1] 51 46 88 23 28
J
[1] 5

Bob_Carpenter · October 14, 2017, 8:53pm

The distribution on kappa isn’t consistent with its declaration. The lower and upper bounds need to match the uniform constraints. We don’t generally recommend uniform distributions constrained to intervals because they cause difficulty both statistically (with bias) and computationally (with heavy tails on the unconstrained space).

Peshkin · October 14, 2017, 10:27pm

Thank you for catching this. I am experimenting further.
Any clue why the warnings only show up when there are more than one chain, is that a bug or a feature ?

Bob_Carpenter · October 16, 2017, 2:47am

Usually the warnings only print when you have a single chain.

RStan’s error reporting is different with one versus multiple chains due to infrastructure issues I don’t understand. It’s not a bug per se.

Peshkin · October 16, 2017, 1:19pm

Unfortunately I keep getting numerical problems:

“Exception thrown at line 20: beta_binomial_log: First prior sample size parameter is 0, but must be 1
When a numerical problem occurs, the Hamiltonian proposal gets rejected.”

with this model

data { 
     int<lower=2> J;          // number of coins 
     int<lower=0> y[J];       // heads in respective coin trials
     int<lower=0> n[J];       // total in respective coin trials
}
parameters {
     real<lower = 0, upper = 1> mu;
     real<lower = 0> kappa;
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;

    alpha = kappa * mu;
    beta = kappa - alpha;
}
model {
    mu ~ uniform(0, 1); 
    kappa ~ exponential(0.05); 
    y ~ beta_binomial(n, alpha, beta);
}

should I try parametrizing differently ?

syclik · October 16, 2017, 1:23pm

I believe this is telling that the beta_binomial() distribution requires n >= 1.

Bob_Carpenter · October 16, 2017, 9:31pm

I would read this as alpha being the first prior count parameter, which matches the code in beta_binomial_lpdf.hpp:

check_positive_finite(function,
                      "First prior sample size parameter", alpha);

So I’m not sure why you’d be getting that mesage.

If you’re vectorizing, all container sizes have to match, too, but that would report a different error.

Did any of that message get chopped off?

Peshkin · October 17, 2017, 7:18am

nothing chopped of, i left out the standard

“When a numerical problem occurs, the Hamiltonian proposal gets rejected.
See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
If the number in the ‘count’ column is small, do not ask about this message on stan-users.”

Peshkin · October 17, 2017, 7:24am

i just tried setting

int<lower=1> n[J];

it does not help …

-Leon

Bob_Carpenter · October 22, 2017, 2:40am

My guess is that your data isn’t coherent. Are you sure each y[j] <= n[j]? I ran your model with data consistent with the model,

> J = 2
> y = c(1, 3)
> n = c(2, 5)

and it works just fine,

       mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu     0.55    0.00  0.16  0.23  0.43  0.55  0.66  0.84  2643    1
kappa 24.45    0.42 21.51  2.07  9.36 18.43 32.65 83.28  2609    1
alpha 13.39    0.27 13.01  0.93  4.64  9.55 17.66 47.71  2352    1
beta  11.07    0.21 10.80  0.72  3.67  7.72 14.68 40.46  2638    1
lp__  -5.34    0.03  1.01 -8.06 -5.72 -5.04 -4.62 -4.36  1593    1

Topic		Replies	Views
Unable to sample from the prior distribution Modeling	8	515	August 4, 2020
Beta distribution with unknown boundaries not converging Modeling fitting-issues	2	618	November 10, 2021
Issue with Beta Priors in ODE model Modeling	2	648	February 14, 2020
Confusing sampling observations with challenging prior Modeling	4	84	September 22, 2024
Fitting distribution with data on probability mass in bins Modeling techniques , fitting-issues , specification	1	296	November 16, 2023

Optimizing() and sampling() over disjoint domain/ excluded point

Related topics