Mixtures with censoring?

In experiments on human cognition using reaction time (RT) as an outcome measure, we often have a procedure-enforced censoring of observations such that the experiment interrupts data collection if a given response is taking longer than, say, 1s, yielding an NA as data for that instance. (There’s also an implied lower bound at zero, but where RTs are usually skewed anyway, models often use log-normal error or a more elaborate latent process like Wiener diffusion that yields a similarly skewed distribution of error.)

So we might use a right-censored model along the lines of the following (adapted from the manual v2.16, pg 188):

data {
    int<lower=0> N_obs;
    int<lower=0> N_cens;
    real y_obs[N_obs]; 
    real<lower=max(y_obs)> U;
}
parameters {
    real mu;
    real<lower=0> sigma;
}
model {
    y_obs ~ lognormal(mu, sigma);
    target += N_cens * lognormal_lccdf(U | mu, sigma);
}

In these experiments we also observe varying degrees of “contamination” responses, reflecting usually either fast guessing or slow responses after a brief inattentive period. There are more complicated models of these contamination sources that become identified with lots of data, but a useful first approximation to merely “deal with” them in cases of more moderate amounts of data is to model responses as a mixture with some proportion, p, of log-normally distributed “non-contamination” responses and 1-p proportion uniformly distributed “contamination” responses.

In this case, would the following be the appropriate expression of the mixture with censoring?

data {
    int<lower=0> N_obs;
    int<lower=0> N_cens;
    real y_obs[N_obs]; 
    real<lower=max(y_obs)> U; 
}
transformed_data{
    real uniflpdf;
    uniflpdf = uniform_lpdf(0|0,U); // lpdf of the contaminations (data-invariant)
}
parameters {
    real mu;
    real<lower=0> sigma;
    real<lower=0,upper=1> mix_prop ;
}
model {
    target += log_sum_exp(
          log(mix_prop) + lognormal_lpdf( y_obs | mu , sigma)
        , log1m(mix_prop) + uniflpdf
    ) ;
    target += N_cens * lognormal_lccdf(U | mu, sigma);
}
1 Like

You can just use

target += log_mix(mix_prop, lognormal_lpdf(y_obs | mu , sigma), uniflpdf);

and the declare-define lets you do

real uniflpdf = uniform_lpdf(0 | 0, U);

Otherwise, the model looks OK.

1 Like

Thanks!

Apologies for reviving a very old topic @mike-lawrence , but I’m currently working on closely related modelling approaches, and I’d like to discuss one aspect of your code to make sure I’m not missing something.

You modelled the censored observations as follows:

target += N_cens * lognormal_lccdf(U | mu, sigma);

However, should this not be adjusted for the probability mix_prop that the log-normal process is actually triggered? Perhaps something like the following?

target += N_cens * (log(mix_prop) + lognormal_lccdf(U | mu, sigma));