Itâ€™s not surprising why the mixture and adjusted models fit poorly, since theyâ€™re unrelated to your original data generation. As to the puzzle of why the unadjusted model fits wellâ€¦ My original hunch (which I lost while in the weeds exploring your original question) was that

`u ~ bernoulli(theta)`

leads to an unbiased estimate of `theta`

`y ~ bernoulli((1 - psi) * theta)`

leads to an unbiased estimate of `psi`

, conditional on `theta`

being unbiased
`y ~ bernoulli((1 - psi) * mu)`

leads to an unbiased estimate of `mu`

, given `psi`

is unbiased

So that might lead to unbiased estimates overall, even if the unadjusted double-likelihood is conceptually and statistically odd. From that perspective, itâ€™s not surprising that youâ€™d get unbiased results.

But I bet youâ€™d also get results that do not have adequate coverage. Have you checked to see whether 95% of your (95%) credible intervals include the true values? The unadjusted model is as if you actually drew two `y`

values per `y`

. So an analogy would be if you ran a logistic regression and just doubled every observation. Youâ€™d still get unbiased results, but your intervals would be too narrow.

Edit: Iâ€™ve been running some simulations. From what Iâ€™m seeing, phi (and, to a lesser-extent, theta) donâ€™t have sufficient coverage when the sample size is â€śsmallâ€ť (say, 60 or fewer). But I wasnâ€™t seeing any immediate issues when using 1,000 observations. Other than sample size, I used the same parameters values as you specified, though used the `integrate`

function to find the value of `theta`

given the distribution of X.

```
inv_logit <- function(x){
1 / (1 + exp(-x))
}
mean_logit <- function(alpha, beta, X_mu = 0, X_sigma = 1){
# probability of y given X (inv_logit) weighted by likelihood of X (dnorm)
f <- function(x){
dnorm(x, X_mu, X_sigma) * inv_logit(alpha + beta * x)
}
# Integrate to find average probability of y given distribution of X
integrate(f, -Inf, Inf)
}
generate_data <- function(n = 1000, psi = 0.23, alpha = -0.5, beta = 0.01, X_mu = 0, X_sd = 1){
# Observations recorded without error (but without covariate data)
theta = mean_logit(alpha, beta, X_mu, X_sd)$value
u = rbinom(n, 1, theta)
# Observations recorded with covariate data (but also with error)
X = rnorm(n, X_mu, X_sd)
mu = inv_logit(alpha + beta * X)
y = rbinom(n, size = 1, prob = (1 - psi) * mu)
list(
params = list('n' = n,
'theta' = theta,
'psi' = psi,
'alpha' = alpha,
'beta' = beta,
'X_mu' = X_mu,
'X_sd' = X_sd),
data = list(
N = n,
y = y,
u = u,
X = X)
)
}
```