Estimating mean length-at-birth from an unbalanced sample of unborn and born lengths

Thank you for your response, @hhau

Yes, sorry for that. I’ve changed this in the original post.

Yes I was wondering that too, but I don’t think we have this sort of information.

Yes, I think this is the crux of the issue: I haven’t expressed a likelihood for either X_i or Y_i. In my generative model, the binary Y_i simply tells us whether \theta_i is less than or greater than X_i. This post seems to get at this idea.

What I have described may boil down to a probit regression model. My main concern at this stage is with finding a way to balance the influence of the two samples. For any given length, the born cases are overrepresented and the unborn cases are underrepresented, so I don’t feel comfortable estimating a probability on sheer relative numbers of cases. It causes a pretty clear downward bias in the estimated mean length-at-birth.

The traditional, non-Bayesian approach to addressing imbalance is to weight the cases by the inverse sample sizes. I tried this and it seemed to cause bias in the opposite direction. When I square-rooted the weights, it seemed about right but this seemed a little arbitrary.

Some other discussions on this here and here.

Perhaps what I really want is to find is where the upper tail of the density of unborn lengths crosses the lower tail of the density of born lengths?