Optimizing() and sampling() over disjoint domain/ excluded point


#1

What is the right way to go about optimizing and/or sampling if my variable is
real<lower=0> mu;
except I want to enforce that mu is never exactly 1 i.e. exclude a point on
this line.
I can imagine running twice, once for mu \in (0,1), once for mu \in (0,Inf)
and then selecting a model with better lp_ output. But this looks clumsy
and if you are in multi-dimensional space gets out of hand.

-Leon

#2

To be clear, you want to sample, and then if mu is 0.999999999999931 you’re ok with it, and if it’s 1.00000000021 you’re ok, but if it’s 1.0 exactly as a floating point number you can’t accept this?

Anyway, if you really care about such things, simply sample, and then if you ever get exactly 1 throw away that sample… hint. you will never get exactly 1 so it won’t matter, but this method will work.

When it comes to optimizing, you could maybe get 1.0 exactly as the exact optimum, but if this occurs, what do you want to do about it?

More to the point, I think you should revisit your model and consider why it is that this single point is excluded and what it means to you to be 1 + epsilon for epsilon a tiny number.


#3

Yes correct interpretation.
The problem arises because 1 is inflection point,
I optimize Dirichlet / Beta function and for some data the optimizer fails because there is discontinuity of likelihood at 1.


#4

Hi, Leon! I’m not clear on what you’re trying to do here. Can you write down your likelihood and prior?

Do you mean beta distribution rather than beta function (the function is the normalizer for the distribution)? For the beta distribution, values fall in (0, 1), so values greater than 1 don’t make sense; parameters are only constrained to be positive.


#5

Here is one example, which is easy to repeat, I guess due to the properties of the uniform prior. If I keep alpha and beta positive many chains break on some data, if I constrain to alpha and beta over 1 (convex PDF), all works well. My reasoning is that at alpha=1 the PDF going from convex to concave creates an issue for sampling.
data {
int<lower=2> J; // number of coins
int<lower=0> y[J]; // heads in respective coin trials
int<lower=0> n[J]; // total in respective coin trials
}
parameters {
real<lower=0> alpha;
real<lower=0> beta;
}
transformed parameters {
real<lower=0, upper=1> mu;
mu = (alpha ) / (alpha + beta );
}
model {
alpha ~ uniform(0.,100);
beta ~ uniform(0.,100);
for (j in 1:J) {
y[j] ~ beta_binomial(n[j], alpha, beta);
}
}


#6

Yes, when you have beta(a, b) [included implicitly in beta_binomial(n, a, b)], you’re going to have issues with a < 1 and b < 1 because they have very heavy tails when transformed to the unconstrained scale (it’s a logit transform).

I often find alpha > 100 or beta > 100 because they’re like prior counts; for example, I’m over 100 in the repeated binary trial case study (on Stan web site). When you have a hard constraint at the boundary, we use a logit transform again, and if mass piles up near the boundary, it’s going to be near plus or minus infinity on the unconstrained scale, which can lead to instability.

If you really want a uniform(0, 100) prior, then you need to constrain alpha and beta to have <lower = 0, upper = 100>; otherwise, it’ll just reject above 100 and lead to inefficient sampling. We instead recommned no upper bound and a weakly informative prior (like a Pareto, exponential, or gamma or lognormal) that’s also unconstrained.

I also find it easier to reparameterize, also as described in the manual, and use the parameters alpha / (alpha + beta) which is the prior mean and (alpha + beta), which is like the overall precision or prior count.

You can write that beta-binomial more efficiently and reliably as

y ~ beta_binomial(n, alpha, beta);

#7

Here is re-parametrized model, and I run it like this

fitB = sampling(modK, data=c(“J”,“y”,“n”), chains = 10, iter = 4000, warmup = 3500, control = list(adapt_delta = .9), cores = 1, refresh=-1)

and get “398 divergent transitions after warmup”

By the way for some reason the warnings only show up when there are more than one chain, is that a bug or a feature ?

data {
     int<lower=2> J;          // number of coins
     int<lower=0> y[J];       // heads in respective coin trials
     int<lower=0> n[J];       // total in respective coin trials
}
parameters {
     real<lower = 0, upper = 1> mu;
     real<lower = 0> kappa;
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;

    alpha = kappa * mu;
    beta = kappa - alpha;
}
model {
    mu ~ uniform(0, 1);
    kappa ~ uniform(1,1000);
    y ~ beta_binomial(n, alpha, beta);
}

image

[edit: escaped code]


#8

Forgot to include a data instance

n
[1] 347 396 116 303 345
y
[1] 51 46 88 23 28
J
[1] 5


#9

The distribution on kappa isn’t consistent with its declaration. The lower and upper bounds need to match the uniform constraints. We don’t generally recommend uniform distributions constrained to intervals because they cause difficulty both statistically (with bias) and computationally (with heavy tails on the unconstrained space).


#10

Thank you for catching this. I am experimenting further.
Any clue why the warnings only show up when there are more than one chain, is that a bug or a feature ?


#11

Usually the warnings only print when you have a single chain.

RStan’s error reporting is different with one versus multiple chains due to infrastructure issues I don’t understand. It’s not a bug per se.


#12

Unfortunately I keep getting numerical problems:

“Exception thrown at line 20: beta_binomial_log: First prior sample size parameter is 0, but must be 1
When a numerical problem occurs, the Hamiltonian proposal gets rejected.”

with this model

data { 
     int<lower=2> J;          // number of coins 
     int<lower=0> y[J];       // heads in respective coin trials
     int<lower=0> n[J];       // total in respective coin trials
}
parameters {
     real<lower = 0, upper = 1> mu;
     real<lower = 0> kappa;
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;

    alpha = kappa * mu;
    beta = kappa - alpha;
}
model {
    mu ~ uniform(0, 1); 
    kappa ~ exponential(0.05); 
    y ~ beta_binomial(n, alpha, beta);
}

should I try parametrizing differently ?


#13

I believe this is telling that the beta_binomial() distribution requires n >= 1.


#14

I would read this as alpha being the first prior count parameter, which matches the code in beta_binomial_lpdf.hpp:

check_positive_finite(function,
                      "First prior sample size parameter", alpha);

So I’m not sure why you’d be getting that mesage.

If you’re vectorizing, all container sizes have to match, too, but that would report a different error.

Did any of that message get chopped off?


#15

nothing chopped of, i left out the standard

“When a numerical problem occurs, the Hamiltonian proposal gets rejected.
See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
If the number in the ‘count’ column is small, do not ask about this message on stan-users.”


#16

i just tried setting

int<lower=1> n[J];

it does not help …

-Leon


#17

My guess is that your data isn’t coherent. Are you sure each y[j] <= n[j]? I ran your model with data consistent with the model,

> J = 2
> y = c(1, 3)
> n = c(2, 5)

and it works just fine,

       mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu     0.55    0.00  0.16  0.23  0.43  0.55  0.66  0.84  2643    1
kappa 24.45    0.42 21.51  2.07  9.36 18.43 32.65 83.28  2609    1
alpha 13.39    0.27 13.01  0.93  4.64  9.55 17.66 47.71  2352    1
beta  11.07    0.21 10.80  0.72  3.67  7.72 14.68 40.46  2638    1
lp__  -5.34    0.03  1.01 -8.06 -5.72 -5.04 -4.62 -4.36  1593    1