Inference of a lower bound and strictly positive noise

I’m working with data that represents the time it takes to complete a task, t_i. Task i has a covariate x_i > 0 and a justifiable way to represent the time it takes to complete the task is t_i = \alpha + \beta x_i + \varepsilon where \alpha >0, \beta \ge 0 and \varepsilon > 0 is some noise term. In a sense, the quantity \alpha + \beta x_i represents the minimum time it takes to complete task i. \varepsilon is additional strictly positive noise (assuming something like \varepsilon | \nu, \sigma \sim \mathrm{LogNormal}(\nu, \sigma)) that can only have the effect of increasing the amount of time it takes to complete a task. My goal here is to perform inference on (\alpha, \beta, \varepsilon) but I’m having difficulty figuring out how to properly specify this model in Stan.

I’ve tried some crude approaches such as:

data {
  int<lower=1> N;
  vector<lower=1>[N] X;
  vector<lower=0>[N] T;
}
parameters {
  real<lower=0> alpha;
  real nu;
  real<lower=0> sigma;
}
transformed parameters {
  real<lower=0, upper=min(T - alpha) ./ X)> beta;
  vector<lower=0>[N] errors;

  for (n in 1:N) {
    errors[n] = T[n] - (alpha + beta*X[n]);
  }
}
model {
  alpha ~ normal(0, 2);
  beta ~ normal(0, 2);
  sigma ~ exp(1);
  nu ~ normal(0, 1);

  errors ~ lognormal(nu, sigma);
}

But I end up with beta being nan in some cases and other attempts have resulted in pretty extreme divergences if I can get the model to sample at all. I’ve tried myself into some mental knots trying to figure this out, so I’d appreciate any pointing in the right direction or an example of how one could fit a model of this sort.

1 Like

Hey, amas0, could you share the full code, including where T, N, X, and mu are read in or defined?

Edited the original post to include the data { } block; also mu was a typo and should have been nu, so I fixed that as well.

This might work:

parameters {
  real<lower=0, upper=min(T)> alpha;
  real<lower=0, upper=min((T - alpha) ./ X)> beta;
  real nu;
  real<lower=0> sigma;
}
transformed parameters {
  vector<lower=0>[N] errors = T - (alpha + beta*X);
}
2 Likes

This got me there, thanks! I assumed only bounding beta, since it was dependent on alpha would be sufficient, but my mistake.

Appreciate the quick response.

Jut a note: assuming lognormal noise, I think this model should be estimable directly in brms via the shifted_lognormal family with distributional regression on the shift parameter. It might be instructive to look at the stancode generated there.

1 Like

Why take

t_i = \alpha + \beta \cdot x_i + \epsilon, \quad \epsilon \sim \textrm{lognormal}(\nu, \sigma)

rather than the more traditional log-linear regression,

t_i \sim \textrm{lognormal}(\alpha + \beta \cdot x_i, \sigma).

I would be surprised if BRMS used the OP’s formulation rather than standard loglinear regression.

The Wikipedia isn’t very helpful here, but this is what I’m talking about: Log-linear model - Wikipedia

The previous post on this topic was never answered. :-(

@Bob_Carpenter these are different models, right? For positive \alpha + \beta x_i, the OP’s model places no probability density between zero and \alpha + \beta x_i. AFAIK it’s for this reason that brms contains the shifted_lognormal family.

@Bob_Carpenter we did initially evaluate using the t_i \sim \mathrm{lognormal}(\alpha + \beta\cdot x_i, \sigma) model. It worked fairly well, but the point that @jsocolar noted was more or less why we wanted to try out the shifted model. In particular, we found the model put some mass in regions didn’t track with domain knowledge.

Also, from an inference perspective, we the \alpha + \beta\cdot x_i quantity to more interpretable in the shifted model.

That being said, when we fit it to our actual data there was enough uncertainty that the differences between the two models weren’t really that significant.