Mixture of AFT models and logit-regression


Hello to everyone,
I am trying to fit a model where my data T can be written as follows:

if T>0 T | pars ~ AFT
if T<0 -T | pars ~ AFT

I am overcoming this sign issue in the following way:

T = D*delta + (delta-1)*A

where delta is 1 if T>0, 0 otherwise and consequently D = T (T>0) and A = -T (T<0).
In this way I have both D and A positive variables and I can fit AFT models for them. Delta is basically a Bernoulli variable and so I am fitting a logit-regression for it.

Now the problem in Stan: should I write the three models in the same Stan file? To me they seem to be quite independent from the parameter posterior sampling point of view, so maybe put the three of them in the same huge model could just make life harder to Stan? Or it would be the same from a point of view of efficiency of the sampling?

In the case of doing just one big model, I can just using a sampling expression for each of D, A and delta or it would be more efficient to add them with the target += statement?

Thank you.


Why not just use absolute value?

abs(T) ~ AFT

Not sure what “AFT” is, but it doesn’t really matter given what you wrote down. I don’t understand why you’d introduce a Bernoulli variable here. Nor do I understand what you mean by three models—are there different parameters and different log density functions?


Sorry I was not very clear. My data are a kind of recurrent event, so basically I have an event that should occur after a given time (many times), but what happen is that sometimes I observe an anticipation of the event, sometimes a delay (delay is more frequent and is bigger than anticipation). So what I do is to fit two Accelerated Failure Time model (my AFT, sorry I thought was a common acronym) with different parameters and number of covariates and I also fit a logit regression for delta. So delta determines if there is a delay or an anticipation.

So yes, there are different parameters for the three model. In my opinion the posterior of each parameters of each model depends only on the data of that model, so parameters for the delay depends only on the delay data, for delta only on 0-1 value for anticipation/delay, etc… So they are three different model in some way, not depending one on the other in a hierarchical way.


If they are all independent in the posterior, then fitting them separately will be more efficient. But if it works with them all together, that may be more convenient. It probably won’t make much of a difference.