L-BFGS-B comment

Yes, please!

Aki

I created an issue:

  • Bob

The current approach without the Jacobian already includes this. The implicit uniform priors on the constrained parameters are already in the posterior p( theta | y ) as defined by the model.

Maybe an example will help. Consider the model:

parameter {
  real<lower=0> x;
}
model {
  x ~ exponential(1);
}

What is the MAP estimate of x? I suspect that you’re thinking that it should be at x=0, where the exponential distribution is maximized. However, if you include the Jacobian transformation, you’ll get x = 1.

The constrained distribution p( x ) is simple p(x) = exp(-x)
The unconstrained distribution is

q( x_unc ) = exp(-exp(x_unc)) * exp(x_unc) 
                 = exp(x_unc - exp(x_unc))

If you maximize p(x) wrt x, you get x = 0. If you maximize q(x_unc) wrt x_unc you get x_unc ~= 0 and x = exp(x_unc) ~= 1.

In Stan right now, the optimizer will (try to) give you a value of x=0. (I say try because it would require the optimizer getting to -infinity but that’s another issue.)

I don’t know where the idea came from that we’re not currently doing a valid MAP estimate, we most definitely are.

Perhaps we need to have a skype call to sort this out?

Marcus_Brubaker
October 13
The current approach without the Jacobian already includes this. The implicit uniform priors on the constrained parameters are already in the posterior p( theta | y ) as defined by the model.

Maybe an example will help. Consider the model:

parameter {
real<lower=0> x;
}
model {
x ~ exponential(1);
}

What is the MAP estimate of x? I suspect that you’re thinking that it should be at x=0, where the exponential distribution is maximized. However, if you include the Jacobian transformation, you’ll get x = 1.

The constrained distribution p( x ) is simple p(x) = exp(-x)
The unconstrained distribution is

q( x_unc ) = exp(-exp(x_unc)) * exp(x_unc)
= exp(x_unc - exp(x_unc))

If you maximize p(x) wrt x, you get x = 0. If you maximize q(x_unc) wrt x_unc you get x_unc ~= 0 and x = exp(x_unc) ~= 1.

In Stan right now, the optimizer will (try to) give you a value of x=0. (I say try because it would require the optimizer getting to -infinity but that’s another issue.)

I don’t know where the idea came from that we’re not currently doing a valid MAP estimate, we most definitely are.

I do. Aki requested was that the optimizer optimize the same density
as was being sampled. I thought that meant we would need to include
the Jacobian. My calculus is atrocious, so I’m almost certainly wrong
if there’s any doubt as to who’s confused.

The example is great. Thanks. I think this sorts it out. Let me
let it sink in and I’ll get back to you if I need further clarification.
I’ll close the issue in the meantime.

Thanks.

  • Bob