In experiments on human cognition using reaction time (RT) as an outcome measure, we often have a procedure-enforced censoring of observations such that the experiment interrupts data collection if a given response is taking longer than, say, 1s, yielding an NA as data for that instance. (There’s also an implied lower bound at zero, but where RTs are usually skewed anyway, models often use log-normal error or a more elaborate latent process like Wiener diffusion that yields a similarly skewed distribution of error.)
So we might use a right-censored model along the lines of the following (adapted from the manual v2.16, pg 188):
data {
int<lower=0> N_obs;
int<lower=0> N_cens;
real y_obs[N_obs];
real<lower=max(y_obs)> U;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
y_obs ~ lognormal(mu, sigma);
target += N_cens * lognormal_lccdf(U | mu, sigma);
}
In these experiments we also observe varying degrees of “contamination” responses, reflecting usually either fast guessing or slow responses after a brief inattentive period. There are more complicated models of these contamination sources that become identified with lots of data, but a useful first approximation to merely “deal with” them in cases of more moderate amounts of data is to model responses as a mixture with some proportion, p, of log-normally distributed “non-contamination” responses and 1-p proportion uniformly distributed “contamination” responses.
In this case, would the following be the appropriate expression of the mixture with censoring?
data {
int<lower=0> N_obs;
int<lower=0> N_cens;
real y_obs[N_obs];
real<lower=max(y_obs)> U;
}
transformed_data{
real uniflpdf;
uniflpdf = uniform_lpdf(0|0,U); // lpdf of the contaminations (data-invariant)
}
parameters {
real mu;
real<lower=0> sigma;
real<lower=0,upper=1> mix_prop ;
}
model {
target += log_sum_exp(
log(mix_prop) + lognormal_lpdf( y_obs | mu , sigma)
, log1m(mix_prop) + uniflpdf
) ;
target += N_cens * lognormal_lccdf(U | mu, sigma);
}