Continuing the discussion from Priors on imputed values for Lognormal censored nonlinear model cause divergence:
I’m working with data that is collected by a spectrophotometer (basically, photons per sample), which can be used to estimate the density of certain bacteria in the sample.
There’s some logical ways to estimate the limit of detection, based on calibration methods etc. But I am also curious about having Stan estimate these limits of detection from data.
The problem I encounter when attempting such an estimation, is that I do not have the data separated into N_obs
and N_cens
a priori. So the models in the Stan manual do not work.
I think that one of the problem is that I have multiple predictors linked together by some nonlinear function, so that each y has a corresponding expected value X, but if that value falls below the detection limit, the expected value itself should be instead observed as some random noise around the detection limit. That would correspond to the machine picking up random small signals corresponding to background noise but not really being able to trace the actual density as it dips below the detection limit.
In simplest terms this is what I’m trying to do
data {
int N;
vector[N] y;
//predictor information
vector[N] x1;
vector[N] x2;
}
parameters {
// mu is a global parameter related to predictors x1 and x2 by some function
real mu;
real<lower=0> sigma;
real<upper=min(y_obs)> L;
}
transformed parameters{
// expected value for each y based on mu and predictor info
vector[N] X = function(mu,x1,x2);
}
model {
L ~ ..;
mu ~ ..;
sigma ~ ..;
for(i in 1:N){
if(X[i]> L)
target += normal_lpdf(y[i]| X[i],sigma);
else
target += lognormal_lcdf(L | X[i],sigma);
}
}
Of course, this doesn’t work. I found out that the pedantic mode will return this informative warning:
`Warning in ... A control flow statement depends on parameter(s):`
I really do wonder if I can fit something like this though in Stan. I can obviously simulate data from this generative model. In fact, I would really be interested also to know if the random noise around the detection limit can be separately modelled, something like sigma
for the observed and sigma_cens
for the noise around the detection limit.
Here is a reprex that is based on the process I explained written for R
require(tidyverse)
## true parameters for simulation
mu = 3; sigma = 0.01; sigma_cens = 0.005;L=1.85;
set.seed(2)
reprex_cens_dat = tibble(
x1 = seq(0,0.5,length.out=1e2), # first predictor
x2 = rep(c(2,10),each=50),# second predictor
X = mu*x2/(1+x2/exp(-x1))) %>% # formula involving mu and predictors
mutate(
X_cens = ifelse(X < L, rnorm(n(),L,sigma_cens),X), # expected value with censoring
y = rnorm(1e2,mean=X_cens,sd=sigma)) # observed value with experimental noise
# visualize: red points are actual expected values, black points are measured value with censoring around L
reprex_cens_dat %>% ggplot(aes(x1,y)) + geom_point() + geom_point(aes(y=X),color='red4',size=0.5)