Divergence in fitting a mixture of censored and truncated data

Hello,

I’ve got some trouble to estimate my parameters from a convoluted signal stemming from a mix of truncated and censored distributions.
To diagnose my model I generated some simulated data.

I can recover the parameters quite well from the convoluted signal and the traceplots look ok, but I’m constantly getting a lot of diverging transitions. I feel like I’m missing something and have tried different priors or increasing adapt_delta but nothing really helped to get rid of the divergences.

Generate data:

K <- 2
N <- 1000
TRUNC <- c(3, 0)
CENS <- 21
MIXING_PROBABILITIES <- c(0.7, 0.3)
DISTS <- list(
    z1 = list(mu = 10, sd = 4),
    z2 = list(mu = 2, sd = 2)
    )

y <- matrix(-Inf, nrow = N, ncol = 1)
for (n in 1:N) {
    # under which stressor does n die?
    z <- rcategorical(1, p = MIXING_PROBABILITIES)
    while (y[n, 1] < TRUNC[z]) {
        y[n, 1] <- rnorm(1, DISTS[[z]]$mu, DISTS[[z]]$sd)
    }
}
y <- ifelse(y > CENS, CENS, y)

hist(y[, 1], breaks = 20)

PRIOR_DELAY <- matrix(
    c(7, 4,
      2, 4),
    byrow = TRUE, ncol = 2, nrow = 2)

standata <- list(
  # dataset dimensions
  N = N,
  K = K,

  # dataset
  y = as.vector(y),
  WEIGHTS = MIXING_PROBABILITIES,
  CENS = CENS,
  TRUNC = TRUNC,
  PRIOR_DELAY = PRIOR_DELAY
)

This is the corresponding stan code. Generated quantities section is responsible to reconstruct the censored-truncated signal

functions {
  real normal_rcensed_ltrunc( real x, real alpha, real beta, real ul, real lt) {
    real mylp;
    if(x < lt){
      mylp = negative_infinity();               // probability is infinity if x falls below lower truncation bound
    } else {
      if(x < ul){
        mylp = normal_lpdf (x  | alpha, beta);   // probability of x falling on normal gamma
        mylp += -normal_lccdf(lt | alpha, beta); // rescale due to truncation (divide over integral)
      } else {
        mylp = normal_lccdf(ul | alpha, beta); 
        mylp += -normal_lccdf(lt | alpha, beta); 
      }
    }
    return mylp;
  }
}
data {
  int<lower=1> N;                // number of data points
  int<lower=1> K;
  real y[N];                     // observations (death counts at day)
  vector[K] WEIGHTS;
  real CENS;
  real TRUNC[K];
  matrix[K, 2] PRIOR_DELAY;
}
parameters {    
  // here parameters are drawn
  real<lower=0> delay[K];       // locations of mixture components
  real<lower=0> sigma[K];      // scales of mixture components
}
model {
  // there are only ever evaluations on how likely a parameter is given the prior
  for (k in 1:K){
    delay ~ normal(PRIOR_DELAY[k,1], PRIOR_DELAY[k,2]);
    sigma ~ cauchy(0, 5);
  }

  for (n in 1:N) {                                       // add log probabilities to log posterior probability vector
    vector[K] lps = log(WEIGHTS);    
    // calculate the center of the distribution as estimated time until effect
    // plus time lag until the stressor was applied
    for (k in 1:K) {
      lps[k] += normal_rcensed_ltrunc(y[n], delay[k], sigma[k], CENS, TRUNC[k]);
    }
    target += log_sum_exp(lps);
  }
}
generated quantities {
  matrix[N,K] yk_predict;
  vector[N] y_predict;
  int choice[N];
  // posterior predictions
  for (n in 1:N){
    // add truncating -- sample again until the sample falls inside the truncated distribution
    // one iteration is ensured if LAG = 0
    for(k in 1:K){
      yk_predict[n,k] = negative_infinity();
      // add truncating -- sample again until the sample falls inside the truncated distribution
      // one iteration is ensured if LAG = 0
      while(yk_predict[n,k] < TRUNC[k]){
        yk_predict[n,k] = normal_rng(delay[k], sigma[k]);
      }
    }
    choice[n] = categorical_rng(WEIGHTS); 
    y_predict[n] = yk_predict[n,choice[n]];  
    // set to limit if 
    if(y_predict[n] > CENS) y_predict[n] = CENS;
  }
}

I would be extremely happy to get some leads or ideas of ways I could explore this problem better and get rid of the divergences.