I’m working with data that represents the time it takes to complete a task, t_i. Task i has a covariate x_i > 0 and a justifiable way to represent the time it takes to complete the task is t_i = \alpha + \beta x_i + \varepsilon where \alpha >0, \beta \ge 0 and \varepsilon > 0 is some noise term. In a sense, the quantity \alpha + \beta x_i represents the minimum time it takes to complete task i. \varepsilon is additional strictly positive noise (assuming something like \varepsilon | \nu, \sigma \sim \mathrm{LogNormal}(\nu, \sigma)) that can only have the effect of increasing the amount of time it takes to complete a task. My goal here is to perform inference on (\alpha, \beta, \varepsilon) but I’m having difficulty figuring out how to properly specify this model in Stan.

But I end up with beta being nan in some cases and other attempts have resulted in pretty extreme divergences if I can get the model to sample at all. I’ve tried myself into some mental knots trying to figure this out, so I’d appreciate any pointing in the right direction or an example of how one could fit a model of this sort.

Jut a note: assuming lognormal noise, I think this model should be estimable directly in brms via the shifted_lognormal family with distributional regression on the shift parameter. It might be instructive to look at the stancode generated there.

@Bob_Carpenter these are different models, right? For positive \alpha + \beta x_i, the OP’s model places no probability density between zero and \alpha + \beta x_i. AFAIK it’s for this reason that brms contains the shifted_lognormal family.

@Bob_Carpenter we did initially evaluate using the t_i \sim \mathrm{lognormal}(\alpha + \beta\cdot x_i, \sigma) model. It worked fairly well, but the point that @jsocolar noted was more or less why we wanted to try out the shifted model. In particular, we found the model put some mass in regions didn’t track with domain knowledge.

Also, from an inference perspective, we the \alpha + \beta\cdot x_i quantity to more interpretable in the shifted model.

That being said, when we fit it to our actual data there was enough uncertainty that the differences between the two models weren’t really that significant.