# Mode of beta distribution is displaced?

In short, running Stan’s joint optimization routine on a transformed beta seems to recover the wrong mode.
Following the “mode and concentration” parameterization on Wikipedia we can design beta priors with specific modes. Mapping this to another space using an appropriate Jacobian we get a program like

``````transformed data {
real<lower=0> conc = 10;
real<lower=0,upper=1> mode = 0.25;
}

parameters {
real a;
}

model {
Phi(a) ~ beta(mode*(conc-2) + 1, (1-mode)*(conc-2) + 1);
target += normal_lpdf(a | 0., 1.);
}
``````

The mode should be preserved under a monotonic transformation like `Phi`/`inv_Phi`, but if I run `model.optimizing` in Pystan I get a result that is off from what I expected. I have after some head-scratching managed to identify the proposed optimum as lying between the mean and the median of the density in the original space, but this doesn’t help much (and mean/median are not consistent under transformations).

Note that the samples are mapped back to the unit interval space on which the prior lives. There seems to be a consistent positive bias, but it seems to vanish if the mode is at `0.5`.

Am I doing something wrong or is there a problem with how I am using the optimization routine? Do I need to take extra steps to handle transformed densities?

Why are you using the Phi transformation on `a`? Wouldn’t it be easier to define

``````real<lower=0,upper=1> a;
``````

and then just

``a ~ beta(mode*(conc-2) + 1, (1-mode)*(conc-2) + 1)``

Sure, but in my particular case I am employing the transformation, so it is included here in case that’s where the error is at.

edit: I see now that the implementation could be done in reverse, applying `Phi` to the sample in e.g. a generated quantities block, so I guess the argument for me using the transformation in the `~` statement is that I avoid managing another auxiliary variable. It’s also convenient for maps lacking analytical inverses, though.

edit: without the transformation it appears to work fine, with mode and optimum matching.

Okay, I think my basic premise here is wrong. Because of the Jacobian correction, the mode actually does get shifted when you perform a transformation.

Ah now that I see you are using the mode, it makes sense. Indeed the mode is not equivariant under non-linear transformations. For instance, the mode of the normal distribution is `mu` while the mode of the log-normal distribution is `exp(mu - sigma^2)`.

1 Like

Yeah, that’s also the distribution I went back to, to verify :)

Here’s the math for it:
If Y\sim g^{-1}(X) and X\sim p_X then we have density

p_Y(Y)=p_X(g(Y))g'(Y)

where g' is the derivative of g. If we take the derivative of the whole density we get

p_X'(g(Y))(g'(Y)^2)+p_X(g(Y))g''(Y)

if we plug in the transformed mode of p_X, Y^*=g^{-1}(X^*), we are left with p_X(X^*)g''(Y^*).

I wrote a case study about reparameterization and optimization, but I’m not entirely happy with it (not that it’s wrong, but it’s just not clearly enough making the point).

This isn’t a Stan issue per se, it’s just a general issue with changes of variables and mode-based point estimates not having any statistical interpretation.