Conditional posterior predictions for three-level occupancy model

So I thought I had it, but I’m having trouble with the model fits, and I’m not sure if I have missed something when implementing the model, or if there’s some fundamental inidentifiability in the data.

functions {
  real sample_detection_lpmf(int Y, int N, real p, real theta){
    return bernoulli_logit_lpmf(1 | theta) +
      binomial_logit_lpmf(Y | N, p);
  }
  
  real sample_nondetection_lpmf(int Y, int N, real p, real theta) {
    return log_sum_exp(
      sample_detection_lpmf(Y | N, theta, p), bernoulli_logit_lpmf(0 | theta));
  }
  
  real sample_occupancy_sum_lpmf(array[,] int Y, vector p, vector theta) {
    int K = size(Y);
    real temp_target = 0;
    
    for(k in 1:K) {
      if(Y[k, 1] > 0) {
        temp_target += sample_detection_lpmf(Y[k, 1] | Y[k, 2], p[k], theta[k]);
      } else {
        temp_target += sample_nondetection_lpmf(Y[k, 1] | Y[k, 2], p[k], theta[k]);
      }
    }
    
    return(temp_target);
  }
}
data {
  int N_sites;        // Number of sites
  int N_replicates;   // Number of replicates
  array[N_replicates, 2] int Y;  // Raw data [Replicate][Y, N]
  array[N_sites] int Q;
  
  // Different number of replicates per site, so the input replicates are passed in site
  // order - then we can access them per site by looping over the group sizes
  array[N_sites] int rep_group_sizes;

  // Site occupancy predictors
  int N_pred_occ;                    // Number of predictors
  matrix[N_sites, N_pred_occ] X_occ; // Predictor matrix for occupancy
  
  // Replicate occupancy predictors
  int N_pred_rep;
  matrix[N_replicates, N_pred_rep] X_rep;

  // Read abundance predictors
  int N_pred_reads;
  matrix[N_replicates, N_pred_reads] X_reads;
}
parameters {
  // Site occupancy parameters
  vector[N_pred_occ] b_occ;

  // Sample occupancy parameters
  vector[N_pred_rep] b_rep;

  // Read abundance parameters
  vector[N_pred_reads] b_reads;
}
model {
  // Priors
  // Site occupancy
  target += normal_lpdf(b_occ | 0, 3);

  // Sample occupancy
  target += normal_lpdf(b_rep | 0, 3);

  // Read abundance 
  target += normal_lpdf(b_reads | 0, 3);

  vector[N_replicates] p = X_reads * b_reads;
  vector[N_replicates] theta = X_rep * b_rep;
  vector[N_sites] psi = X_occ * b_occ;

  int start=1; //replicates are ordered by site - starting at replicate 1
  for(j in 1:N_sites) {
    int end = start + rep_group_sizes[j] - 1; // add number of reps for site j and subtract 1

    if(Q[j] == 0) {
      target += bernoulli_logit_lpmf(1 | psi[j]) + 
          sample_occupancy_sum_lpmf(Y[start:end,] | p[start:end], theta[start:end]);
    }
    if(Q[j] == 1) {
      target += log_sum_exp(bernoulli_logit_lpmf(1 | psi[j]) + 
          sample_occupancy_sum_lpmf(Y[start:end,] | p[start:end], theta[start:end]),
          bernoulli_logit_lpmf(0 | psi[j]));
    }
    start += rep_group_sizes[j]; // update start point for next site
  }
}
generated quantities {
  vector[N_replicates] p = X_reads * b_reads;
  vector[N_replicates] theta = X_rep * b_rep;
  vector[N_sites] psi = X_occ * b_occ;
}

Running the model for a single species, it seems to happily predict \psi for some range of predictors, but it struggles to properly identify \theta and p - for the former, it pretty much unilaterally predicts a posterior probability of occupancy close to zero, and for the latter, the probability of a read being classified as belonging to the species as close to one (x axes on the probability scale, y axes = predicted parameter for each site/replicate):

For reference, the species I tested this on is detected at 5 of 30 sites, and within sites with detections, is naively found in about 6 - 20% of samples at each site. Within each sample with detections, it is present at a naive rate of about 0.003 - 0.25% of sequencing reads returned. However, this is a problem present with all species - fitting the same model hierarchically for all species, all species appear to display this pattern.

I’m really not sure what I’m missing here - I have a suspicion that there’s something wrong with my likelihood, either my implementation in Stan or in the maths above, but I’ve been thinking about it all day and can’t quite figure out what it might be. However, the problem seems to be described pretty exactly by @jsocolar here: How to generate posterior predictive distribution for hierarchical occupancy model in Stan? - #4 by jsocolar. Anyone have any idea what I’ve missed here?