L-BFGS-B comment

avehtari · October 13, 2016, 3:42pm

Yes, please!

Aki

Bob_Carpenter · October 13, 2016, 5:29pm

I created an issue:

Bob

Marcus_Brubaker · October 13, 2016, 7:20pm

The current approach without the Jacobian already includes this. The implicit uniform priors on the constrained parameters are already in the posterior p( theta | y ) as defined by the model.

Maybe an example will help. Consider the model:

parameter {
  real<lower=0> x;
}
model {
  x ~ exponential(1);
}

What is the MAP estimate of x? I suspect that you’re thinking that it should be at x=0, where the exponential distribution is maximized. However, if you include the Jacobian transformation, you’ll get x = 1.

The constrained distribution p( x ) is simple p(x) = exp(-x)
The unconstrained distribution is

q( x_unc ) = exp(-exp(x_unc)) * exp(x_unc) 
                 = exp(x_unc - exp(x_unc))

If you maximize p(x) wrt x, you get x = 0. If you maximize q(x_unc) wrt x_unc you get x_unc ~= 0 and x = exp(x_unc) ~= 1.

In Stan right now, the optimizer will (try to) give you a value of x=0. (I say try because it would require the optimizer getting to -infinity but that’s another issue.)

I don’t know where the idea came from that we’re not currently doing a valid MAP estimate, we most definitely are.

Perhaps we need to have a skype call to sort this out?

Bob_Carpenter · October 13, 2016, 7:47pm

Marcus_Brubaker
October 13
The current approach without the Jacobian already includes this. The implicit uniform priors on the constrained parameters are already in the posterior p( theta | y ) as defined by the model.

Maybe an example will help. Consider the model:

parameter {
real<lower=0> x;
}
model {
x ~ exponential(1);
}

What is the MAP estimate of x? I suspect that you’re thinking that it should be at x=0, where the exponential distribution is maximized. However, if you include the Jacobian transformation, you’ll get x = 1.

The constrained distribution p( x ) is simple p(x) = exp(-x)
The unconstrained distribution is

q( x_unc ) = exp(-exp(x_unc)) * exp(x_unc)
= exp(x_unc - exp(x_unc))

If you maximize p(x) wrt x, you get x = 0. If you maximize q(x_unc) wrt x_unc you get x_unc ~= 0 and x = exp(x_unc) ~= 1.

In Stan right now, the optimizer will (try to) give you a value of x=0. (I say try because it would require the optimizer getting to -infinity but that’s another issue.)

I don’t know where the idea came from that we’re not currently doing a valid MAP estimate, we most definitely are.

I do. Aki requested was that the optimizer optimize the same density
as was being sampled. I thought that meant we would need to include
the Jacobian. My calculus is atrocious, so I’m almost certainly wrong
if there’s any doubt as to who’s confused.

The example is great. Thanks. I think this sorts it out. Let me
let it sink in and I’ll get back to you if I need further clarification.
I’ll close the issue in the meantime.

Thanks.

Bob

Topic		Replies	Views
Under what conditions should I expect stan's optimization to be deterministic? General	2	819	January 8, 2020
Optimization / adaptive importance sampling loop for rstan General	6	902	August 24, 2018
Any way to make Stan competitive with Tensorflow for maximum likelihood? Algorithms	43	5883	July 31, 2019
StanEstimators: New R Package Exposing Stan Methods to R Publicity	3	521	January 16, 2024
Rstanarm(optimize = TRUE) diagnostics rstanarm	2	515	November 15, 2019

L-BFGS-B comment

Related Topics