This is my Stan file:

```
data {
int<lower=0> N; // number of observations
array[N] int<lower=0, upper=1> Y; // vector of binary observations
}
parameters {
real<lower=0, upper=1> theta; // probability of success
}
model {
theta ~ beta(1, 1); // prior
Y ~ bernoulli(theta); // observation model / likelihood
}
```

My data and optimization

```
Y=c(rep(0,9), 1)
data_list = list(N = length(Y), Y = Y)
model01$optimize(data = data_list)
```

produces the following result:

```
Initial log joint probability = -6.41748
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
6 -3.25083 0.0035622 5.48885e-05 1 1 9
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
variable estimate
lp__ -3.25
theta 0.10
```

This gives the MLE solution.

With an option `jacobian = TRUE`

, the result is different

```
model01$optimize(data = data_list, jacobian = TRUE)
```

```
Initial log joint probability = -10.9632
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
6 -5.40673 0.000314983 1.72e-06 1 1 9
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.3 seconds.
variable estimate
lp__ -5.41
theta 0.17
```

Now `theta`

is 0.17 which is definitely not MLE (MLE is 0.1 given before).

Apperently, the two optimizations are not the same due to the flag `jacobian = TRUE`

.

Can anyone help me find the reason of this happening?

- What is the interpretation of the result from
`optimize(jacobian = TRUE)`

The mathematics behind this option would also be very much helpful.

The explanation below from the manual page did not much help to me:

In `cmdstanr`

, the `optimize()`

method is used for Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) estimation. The `jacobian=TRUE`

argument plays a crucial role when dealing with constrained parameters.

When `jacobian=TRUE`

, the optimization process includes the Jacobian adjustment for constrained variables. This adjustment is necessary because the transformation of variables can affect the density of the posterior distribution. Including the Jacobian ensures that the optimization accounts for this change of variables, leading to a more accurate MAP estimate

On the contrary, when the prior is not uniform as shown below:

```
data {
int<lower=0> N; // number of observations
array[N] int<lower=0, upper=1> Y; // vector of binary observations
}
parameters {
real<lower=0, upper=1> theta; // probability of success
}
model {
theta ~ beta(10, 1); // prior
Y ~ bernoulli(theta);
```

The two `theta`

results from `optimize`

with and without `jacobian=TRUE`

are all **the same** (even though `lp__`

are different):

```
stanfile = "bernoulli.stan"
model = cmdstan_model(stanfile)
data_list = list(N=11, Y=c(rep(0,10), 1))
```

```
model$optimize(data = data_list)
```

```
Initial log joint probability = -13.9528
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
3 -13.8629 0.000429068 1.60942e-06 1 1 6
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
variable estimate
lp__ -13.86
theta 0.50
```

```
model$optimize(data = data_list, jacobian = TRUE)
```

```
Initial log joint probability = -16.4593
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
4 -15.2492 0.000309459 1.13377e-08 1 1 7
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Finished in 0.2 seconds.
variable estimate
lp__ -15.25
theta 0.50
```

Very much curuious!

Thanks.

End.