This is my Stan file:
data {
int<lower=0> N; // number of observations
array[N] int<lower=0, upper=1> Y; // vector of binary observations
}
parameters {
real<lower=0, upper=1> theta; // probability of success
}
model {
theta ~ beta(1, 1); // prior
Y ~ bernoulli(theta); // observation model / likelihood
}
My data and optimization
Y=c(rep(0,9), 1)
data_list = list(N = length(Y), Y = Y)
model01$optimize(data = data_list)
produces the following result:
Initial log joint probability = -6.41748
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
6 -3.25083 0.0035622 5.48885e-05 1 1 9
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
variable estimate
lp__ -3.25
theta 0.10
This gives the MLE solution.
With an option jacobian = TRUE
, the result is different
model01$optimize(data = data_list, jacobian = TRUE)
Initial log joint probability = -10.9632
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
6 -5.40673 0.000314983 1.72e-06 1 1 9
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.3 seconds.
variable estimate
lp__ -5.41
theta 0.17
Now theta
is 0.17 which is definitely not MLE (MLE is 0.1 given before).
Apperently, the two optimizations are not the same due to the flag jacobian = TRUE
.
Can anyone help me find the reason of this happening?
- What is the interpretation of the result from
optimize(jacobian = TRUE)
The mathematics behind this option would also be very much helpful.
The explanation below from the manual page did not much help to me:
In cmdstanr
, the optimize()
method is used for Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) estimation. The jacobian=TRUE
argument plays a crucial role when dealing with constrained parameters.
When jacobian=TRUE
, the optimization process includes the Jacobian adjustment for constrained variables. This adjustment is necessary because the transformation of variables can affect the density of the posterior distribution. Including the Jacobian ensures that the optimization accounts for this change of variables, leading to a more accurate MAP estimate
On the contrary, when the prior is not uniform as shown below:
data {
int<lower=0> N; // number of observations
array[N] int<lower=0, upper=1> Y; // vector of binary observations
}
parameters {
real<lower=0, upper=1> theta; // probability of success
}
model {
theta ~ beta(10, 1); // prior
Y ~ bernoulli(theta);
The two theta
results from optimize
with and without jacobian=TRUE
are all the same (even though lp__
are different):
stanfile = "bernoulli.stan"
model = cmdstan_model(stanfile)
data_list = list(N=11, Y=c(rep(0,10), 1))
model$optimize(data = data_list)
Initial log joint probability = -13.9528
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
3 -13.8629 0.000429068 1.60942e-06 1 1 6
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
variable estimate
lp__ -13.86
theta 0.50
model$optimize(data = data_list, jacobian = TRUE)
Initial log joint probability = -16.4593
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
4 -15.2492 0.000309459 1.13377e-08 1 1 7
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
variable estimate
lp__ -15.25
theta 0.50
Very much curuious!
Thanks.
End.