That’s a very good lead, thank you.
Here’s my code, which I have numerically checked that is consistent with your code, but can be applied to any order statistic:
data {
int<lower=0> N;
array[N] int<lower=0> cell_values;
int n; //number of trials in the cell
int k; //which order statistic we see (e.g. n if the max)
}
parameters {
real<lower=0> lambda;
}
model {
for (i in 1:N) {
target += lgamma(n+1) - lgamma(k) - lgamma(n-k+1) +
(k-1)*poisson_lcdf(cell_values[i] | lambda) +
(n-k)*poisson_lccdf(cell_values[i] | lambda) +
poisson_lpmf(cell_values[i] | lambda);
}
}
I have a problem though: the Bayesian solution is consistently biased.
Here’s a driver I use:
generative_model_4<-function(n, k, lambda, N) {
# Randomizes an array of size n containing number of particles using Poisson distribution with assumed lambda
data <- rpois(n * N, lambda)
dim(data) <- c(n, N)
for(i in 1:N) {
data[,i] <- sort(data[,i])
}
return(list(N=N, cell_values=data[k,], n=n, k=k))
}
data <- generative_model_4(15,15, 0.15, 100)
data
# Run the model
fit <- sampling(model, data=data, iter=10000, chains=1, seed=12346)
fit
With output:
> fit
Inference for Stan model: anon_model.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
lambda 0.25 0.00 0.02 0.21 0.23 0.25 0.26 0.29 150 1
lp__ 27.77 0.05 0.73 25.50 27.64 28.03 28.22 28.28 198 1
Samples were drawn using NUTS(diag_e) at Wed Oct 4 16:27:59 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Clearly, the reconstructed lambda is lower than the actual one. It is not a bad luck - the inferred lambda is consistently lower above the actual value for all the input combination I have tried.
The situation is much worse for other order statistics than the max.
What do I do wrong?