I am trying to implement a two-step estimator to account for my DV being discretized. In the first step, I need to estimate the mean av_y and st. dev. \sigma_y of the underlying normal distribution using the midpoints of the bins y_{mp} for N observations. I do that with something like this:
model {
ymp ~ normal( avy, sigmay);
avy ~ normal(muavy, sigmaavy);
sigmay ~ cauchy(musigmay, sigmasigmay);
// then some hierarichal priors for muavy and sigmaavy
}
Then I need to combine these estimates with the bin edges X, X_{+1} to create a new dependent variable
y_{EC} = av_y + \sigma_y \frac{\phi( (X_n - av_y) / \sigma_y ) - \phi( (X_n^{+1} - av_y) / \sigma_y ) } { \Phi( (X_n^{+1} - av_y) / \sigma_y ) - \Phi( (X_n - av_y) / \sigma_y ) }
and estimate the second-step parameters (say \alpha and \sigma_\zeta) that account for the uncertainty in the estimates of av_y and \sigma_y. Let’s say for simplicity that the second-stage model is y_{EC} \sim N(\alpha, \sigma_\zeta). What I have tried last is
transformed parameters{
real yEC[N];
for (n in 1:N) {
yEC[n] = avy + sigmay *
( exp(normal_lpdf((X[y[n] ] - avy)/sigmay | 0, 1)) -
exp(normal_lpdf((X[y[n]+1] - avy)/sigmay | 0, 1)) ) /
( exp(normal_lcdf((X[y[n]+1] - avy)/sigmay | 0, 1)) -
exp(normal_lcdf((X[y[n] ] - avy)/sigmay] | 0, 1)) )
}
model {
target += normal_lpdf( avy | muavy, sigmaavy);
target += normal_lpdf( sigmay | musigmay, sigmasigmay);
target += normal_lpdf( yEC | alpha , sigmazeta);
// priors for parameters alpha and sigmazeta
The true model is quite more complicated but I believe this code gets to the core of my question.
My problem is that the av_y estimated in the second stage have a mean way lower than the muavy and this problem does not happen if I use the original data to estimate the second step directly (pretending the bin midpoints are the actual data). I originally used one single program to estimate both steps, but got the bias. Then ran the first step, and processed the chains to compute muavy, sigmaavy, musigmay, sigmasigmay and passed as data to the second step. The bias survives.
I suspect that the likelihood of y_{EC} \sim N(\alpha, \sigma_\zeta) is pulling the mean of av_y towards zero. That is, I cannot fully separate the two models.
Thanks in advance.