I’ve recently started playing around with Stan and I had this idea of a model that models the maximum of a number of i.i.d. variables. I generated some data where each entry is the max of 100 i.i.d. exponential variables. I then set up the model below. It compiles and the sampling runs ok, but n is stuck at 1.0. Can anyone explain why this doesn’t work?
data {
int<lower=0> N;
real<lower=0> y[N];
}
parameters {
real<lower=0> mu;
real<lower=1> n;
}
model {
target += log(n) + (n-1)*exponential_lcdf(y | mu) + exponential_lpdf(y | mu);
}
Conceptually, n
is an integer, so I don’t think Stan is going to be able to get its posterior distribution. For large n, I think y
has a Gumbel distribution, but by assuming n
is large, you make it impossible to estimate when n
is finite.
This is true, but I don’t see why it would trip up Stan. Maximizing the same log likelihood function constructed with SciPy I can recover the parameters, yet Stan’s optimizer and sampler agree that n=1.
I just simulated data in R
mu <- 0.5
I <- 75
n <- 100
y = rep(NA, I)
for (i in 1:I) y[i] <- max(rexp(n, mu))
and then wrote the objective function out,
f <- function(y, n, mu) {
log(n) +
(n - 1) * sum(pexp(y, mu, log.p=TRUE)) +
sum(dexp(y, mu, log=TRUE))
}
and then trying it on some values, I get that 2 is the best integer value
$ for (n in 1:20) print(sprintf("%3d %6.2f", n, f(y, n, mu)), quote=FALSE)
[1] 1 -436.34
[1] 2 -436.33
[1] 3 -436.60
[1] 4 -436.99
[1] 5 -437.44
[1] 6 -437.94
[1] 7 -438.46
[1] 8 -439.01
[1] 9 -439.57
[1] 10 -440.14
[1] 11 -440.72
[1] 12 -441.31
[1] 13 -441.91
[1] 14 -442.51
[1] 15 -443.12
[1] 16 -443.73
[1] 17 -444.35
[1] 18 -444.97
[1] 19 -445.59
[1] 20 -446.22
@Bob_Carpenter Thanks for doing this and helping a novice. When I saw you write out the objective function I realized my embarrassing mistake. The objective function should naturally be
target += N*log(n) + (n-1)*exponential_lcdf(y | mu) + exponential_lpdf(y | mu);
I forgot to multiply log(n)
with the number of data points.