What is the right way to go about optimizing and/or sampling if my variable is
real<lower=0> mu;
except I want to enforce that mu is never exactly 1 i.e. exclude a point on
this line.
I can imagine running twice, once for mu \in (0,1), once for mu \in (0,Inf)
and then selecting a model with better lp_ output. But this looks clumsy
and if you are in multi-dimensional space gets out of hand.

To be clear, you want to sample, and then if mu is 0.999999999999931 you’re ok with it, and if it’s 1.00000000021 you’re ok, but if it’s 1.0 exactly as a floating point number you can’t accept this?

Anyway, if you really care about such things, simply sample, and then if you ever get exactly 1 throw away that sample… hint. you will never get exactly 1 so it won’t matter, but this method will work.

When it comes to optimizing, you could maybe get 1.0 exactly as the exact optimum, but if this occurs, what do you want to do about it?

More to the point, I think you should revisit your model and consider why it is that this single point is excluded and what it means to you to be 1 + epsilon for epsilon a tiny number.

Yes correct interpretation.
The problem arises because 1 is inflection point,
I optimize Dirichlet / Beta function and for some data the optimizer fails because there is discontinuity of likelihood at 1.

Hi, Leon! I’m not clear on what you’re trying to do here. Can you write down your likelihood and prior?

Do you mean beta distribution rather than beta function (the function is the normalizer for the distribution)? For the beta distribution, values fall in (0, 1), so values greater than 1 don’t make sense; parameters are only constrained to be positive.

Here is one example, which is easy to repeat, I guess due to the properties of the uniform prior. If I keep alpha and beta positive many chains break on some data, if I constrain to alpha and beta over 1 (convex PDF), all works well. My reasoning is that at alpha=1 the PDF going from convex to concave creates an issue for sampling.
data {
int<lower=2> J; // number of coins
int<lower=0> y[J]; // heads in respective coin trials
int<lower=0> n[J]; // total in respective coin trials
}
parameters {
real<lower=0> alpha;
real<lower=0> beta;
}
transformed parameters {
real<lower=0, upper=1> mu;
mu = (alpha ) / (alpha + beta );
}
model {
alpha ~ uniform(0.,100);
beta ~ uniform(0.,100);
for (j in 1:J) {
y[j] ~ beta_binomial(n[j], alpha, beta);
}
}

Yes, when you have beta(a, b) [included implicitly in beta_binomial(n, a, b)], you’re going to have issues with a < 1 and b < 1 because they have very heavy tails when transformed to the unconstrained scale (it’s a logit transform).

I often find alpha > 100 or beta > 100 because they’re like prior counts; for example, I’m over 100 in the repeated binary trial case study (on Stan web site). When you have a hard constraint at the boundary, we use a logit transform again, and if mass piles up near the boundary, it’s going to be near plus or minus infinity on the unconstrained scale, which can lead to instability.

If you really want a uniform(0, 100) prior, then you need to constrain alpha and beta to have <lower = 0, upper = 100>; otherwise, it’ll just reject above 100 and lead to inefficient sampling. We instead recommned no upper bound and a weakly informative prior (like a Pareto, exponential, or gamma or lognormal) that’s also unconstrained.

I also find it easier to reparameterize, also as described in the manual, and use the parameters alpha / (alpha + beta) which is the prior mean and (alpha + beta), which is like the overall precision or prior count.

You can write that beta-binomial more efficiently and reliably as

The distribution on kappa isn’t consistent with its declaration. The lower and upper bounds need to match the uniform constraints. We don’t generally recommend uniform distributions constrained to intervals because they cause difficulty both statistically (with bias) and computationally (with heavy tails on the unconstrained space).

Thank you for catching this. I am experimenting further.
Any clue why the warnings only show up when there are more than one chain, is that a bug or a feature ?

“Exception thrown at line 20: beta_binomial_log: First prior sample size parameter is 0, but must be 1
When a numerical problem occurs, the Hamiltonian proposal gets rejected.”

with this model

data {
int<lower=2> J; // number of coins
int<lower=0> y[J]; // heads in respective coin trials
int<lower=0> n[J]; // total in respective coin trials
}
parameters {
real<lower = 0, upper = 1> mu;
real<lower = 0> kappa;
}
transformed parameters {
real<lower=0> alpha;
real<lower=0> beta;
alpha = kappa * mu;
beta = kappa - alpha;
}
model {
mu ~ uniform(0, 1);
kappa ~ exponential(0.05);
y ~ beta_binomial(n, alpha, beta);
}