For a species of animal, I have a dataset of some lengths of prenatal (unborn) individuals and some lengths of postnatal (born) individuals, and I would like to estimate the mean length-at-birth (and the variance among individuals, if possible).
It is much easier to get lengths from postnatal individuals, so the dataset is very unbalanced, with 31 prenatal lengths and >1000 postnatal lengths. Thus, I’m pretty sure that plain old logistic regression is giving me biased estimates of the average length-at-birth. I’ve played with weighting the cases but I have reason to believe that this is causing bias in the other direction.
I’ve come up with a generative model with some informative priors:
where y_i takes the value 1 for postnatal and 0 for prenatal; x_i is the length of an individual when it was measured; \theta_i is the length at which an individual is born; \delta_i is the difference of an individual’s birth length from the mean birth length; \mu is the average age at birth; \sigma is the standard deviation of individual birth lengths.
This seems to be doing what I want it to do. Individuals grow and then are born at some length, and these lengths differ somewhat among individuals.
get_xy <- function() {
sigma = abs(rnorm(1,10,10))
delta = rnorm(1,0,sigma)
mu = rnorm(1,170,2)
theta = mu + delta
length = 100:250
born = length >= theta
return(tibble(length,born))
}
plot(get_xy(), type = "l")
replicate(100, lines(get_xy()))
The data are structured as follows:
Rows: 1,306
Columns: 2
$ Y <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
$ X <dbl> 31.3, 42.5, 47.0, 69.0, 106.0, 168.0, 173.0, 176.0, 176.0, 165~
The trouble is I’m having trouble figuring out how to fit this model to data. Here’s what I have so far:
data{
int n; // sample size
int<lower=0,upper=1> Y[n]; // Born or unborn
vector[n] X; // lengths
}
parameters{
vector[n] delta ;
real<lower=0> sigma;
real mu;
}
model{
real theta[n] ; // length at birth (LAB)
delta ~ normal( 0 , sigma ) ; // difference from mean
mu ~ normal( 170 , 2 ) ; mean LAB
sigma ~ normal( 10 , 10 ) ; sd of LABs
theta = mu + delta ;
if ( Y[i] ) ... X[i] > theta[i] ... ;
}
}
Any assistance would be gratefully received. I’m a first-time poster, so I apologise if this is a bit rough.