Reviving this topic to ask a new label-switching question !
I implemented in Stan the Latent Class Model for Capture-Recapture described in this paper and implemented (without Stan) in the LCMCR package. The model includes a latent class
z \sim \text{Categorical}((1,...,K);(\pi_1,...,\pi_K))
which determines the list recording probabilities \lambda_{jz}.
y_j\ |\ z \sim \text{Bern}(\lambda_{jz}) \text{ independent across lists }j
where y_j = 1 if recorded in list j and =0 otherwise. The capture-recapture goal is to estimate how many are never recorded on any list, and therefore the population total, N.
As recommended in the Stan User’s Guide, I marginalize out z.
But without ordering the list recording probabilities \lambda_{jz}, there is a label switching issue. For K = 3 latent classes, for example, these are the results, which has good rhat
for my quantity of interest (N, the total population size) and lp__
:
> fit$summary(c("N","pi","alpha","lp__"))
# A tibble: 6 x 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 N 9079. 8915. 853. 753. 8010. 10710. 1.00 1213. 1771.
2 pi[1] 0.353 0.331 0.328 0.467 0.0185 0.704 1.74 6.09 118.
3 pi[2] 0.256 0.137 0.268 0.181 0.0185 0.692 2.10 5.25 28.3
4 pi[3] 0.391 0.303 0.168 0.0272 0.269 0.696 1.53 7.11 30.8
5 alpha 2.05 1.59 1.73 1.18 0.342 5.48 1.14 18.1 36.2
6 lp__ -11271. -11271. 4.20 4.15 -11278. -11265. 1.01 2550. 4510.
I think I would only need to order them for one list (e.g. list number 1, without loss of generality). So that the labels are such that the first latent class is the least likely to be in list number 1, the second latent class the second-to-least likely to be in list number 1, etc etc.
Does that make sense ? I’m not sure how to accomplish this syntactically.
data {
int<lower=1> J; // number of lists
int<lower=1> C; // number of observed cells in the dataset, up to 2^J-1
int list_indicators[C, J]; // indicators of being in lists
vector<lower=0>[C] cell_count; // cell count for each capture pattern
int<lower=1> K; // number of latent classes
}
transformed data {
real<lower=0> observed = sum(cell_count);
int zeros[J] = rep_array(0,J);
}
parameters {
vector<lower=0,upper=1>[J] lambda[K]; // list inclusion probabilities for each latent class
vector<lower=0,upper=1>[K - 1] breaks; // break proportions for stick-breaking prior on pi
real<lower=observed> N;
real<lower=0> alpha; // stick-breaking prior parameter
}
transformed parameters {
simplex[K] pi; // latent class mixing proportions
vector[C] log_cell_probability; // log cell probability for each observed capture pattern
real log_unobserved_cell_probability;
// https://mc-stan.org/docs/2_26/stan-users-guide/arithmetic-precision.html#underflow-and-the-log-scale
vector[K] log_pi; // cache log calculation
vector[K] lps_unobserved;
{ // https://discourse.mc-stan.org/t/better-way-of-modelling-stick-breaking-process/2530/2
// see BDA3 p.548
real sum_sticks_so_far = 0;
pi[1] = breaks[1];
sum_sticks_so_far = pi[1];
for (k in 2:(K - 1)) {
pi[k] = (1 - sum_sticks_so_far) * breaks[k];
sum_sticks_so_far += pi[k];
}
pi[K] = 1 - sum_sticks_so_far;
}
log_pi = log(pi); // cache log calculation
lps_unobserved = log_pi;
for (c in 1:C) {
vector[K] lps = log_pi;
for (k in 1:K) {
lps[k] += bernoulli_lpmf(list_indicators[c] | lambda[k]); // https://mc-stan.org/docs/2_26/functions-reference/vectorization.html#evaluating-vectorized-log-probability-functions
}
log_cell_probability[c] = log_sum_exp(lps);
}
for (k in 1:K) {
lps_unobserved[k] += bernoulli_lpmf(zeros | lambda[k]);
}
log_unobserved_cell_probability = log_sum_exp(lps_unobserved);
}
model {
target += lchoose(N, observed) + (N - observed)*log_unobserved_cell_probability + cell_count' * log_cell_probability;
target += -log(N);
breaks ~ beta(1, alpha);
alpha ~ gamma(0.25,0.25);
}