Model, doesn't return simulated distribution

I need to use probabilistic modeling to understand the underlying behavior of a system that generates lots of data. Each time the process runs, the unknown random variable Y is sampled and compared with a second known (but not constant) value x and 1 of 3 outcomes occurs 0) Y > x, 1) Y <= x and no additional information is given or 2) Y <= x and we learn the value of Y. For now, I am studying the abilities and limitations of the model on simulated data (the real world is a bit messier than the description above).

I use a little R script to generate 1mln Y from a Normal(15,3) distribution (arbitrary but near where I might expect my real data. I generate X from a uniform(2,20) distribution (this produces about 70% case 0, but reality is more like 90-99% case 0). I also apply a random selection between cases 1 and 2, about 20% case 2. I bin the X and Y to the floor 0.01 and group the data by Y (and x in case 2) and case.

When I run the model, Y is close but not perfectly identified with mean 15.40 and SD 3.13. Am I mis-specifying something here?

  int <lower = 1> N0;
  vector[N0] CNT_0;
  vector[N0] X_0;
  int <lower = 1> N1;
  vector[N1] CNT_1;
  vector[N1] X_1;
  int<lower = 1> N2;
  vector[N2] CNT_2;
  vector[N2] X_2;
  vector[N2] Y_2;
  real mu;
  real <lower = 0> sigma;
model {
  //priors improper
  // case 0: p(Y > X)
  for (n in 1:N0)
    target += CNT_0[n] * normal_lccdf(X_0[n] | mu, sigma);
  // case 1: p(Y <= X)
  for (n in 1:N1)
    target += CNT_1[n] * normal_lcdf(X_1[n] | mu, sigma);
  // case 2: p(Y = Y_2 | Y <= X)
  for (n in 1:N2) 
    target += CNT_2[n] * (normal_lpdf(Y_2[n] | mu, sigma) - normal_lcdf(X_2[n] | mu, sigma);
generated quantities {
  real sim_Y;
  sim_Y = normal_rng(mu, sigma);

The model converges to Rhat on all parameters 1-1.01 but
mean(mu) is 15.35
mean(sigma) is 3.05
mean(sim_Y) is 15.40 with sd(sim_Y) 3.13
vs. the data simulation script which generates unknown data Y from N(15,3) with mean 15.0008 and SD 2.997

Is this just a limit in identifiability or have a misspecified something in the model. I’ve looked as modelling the binning, but when I include those components doesn’t seem to correct that the model does not identify accurately the unknown distribution.

Interestingly, it seems to work better when I leave cases 0 and 1 out of the model. As long as there is enough data from case 2, case 2 alone seems to get good estimates (the minimum seems to be something like 100,000 raw data points). I am pleased with this result, but I still have a question - why would adding the data from case 0 and case 1 bias the model?