Truncated regression not working

Here is some toy data. I have a truncated regression that sets a lower bound on the outcome using y[i] ~ normal(mu, sigma/weights[i]) T[L[i],];

The code below shows the linear fit WITHOUT the truncation using T[L[i,] you can see the fitted lines are in the correct direction and here is the linear fit WITH T[L[i],] which should set the lower bound on the outcome y >= -x and you can see the fit is not correct.

Why is T[L[i],] doing that? If you look at
table(dat$y >= dat$BOUND) you will see all the outcomes are above the bound

here is the code:

set.seed(456)
x =   rep(seq(-10,10,1),each=5) 
y =  rep(0,length(x) )
## the outcome y has the relationship y >= -x
for(i in 1:length(x)){
  y[i] = sample(  seq(-x[i],-x[i]+ ifelse(x[i]<0 , 
                                          sample(seq(1,10,.1),1),
                                          sample(seq(5,40,.1),1) ),1)
                  
                  ,1)
} 
weights = sample( seq(1,20,1) ,length(x), replace = TRUE)
weights = weights/sum(weights)
groups = rep( letters[1:5], times =length(x)/5  ) 
dat =data.frame(x = x, y = y, weights = weights, group = groups, BOUND = -x)

library(rstan)
head(dat)

dat$GROUP_ID  = match(  dat$group, levels( dat$group   )  ) 
dat$BOUND = -dat$x
 table(dat$y >= dat$BOUND)
quantile(dat$weights)
library(rstan)
standat <- list(
  N = nrow(dat),
  y = dat$y,
  x = dat$x,
  GROUP_ID = dat$GROUP_ID,
  nGROUPS = 5,
  weights = dat$weights,
  L = dat$BOUND 
)

table(dat$y >= dat$BOUND)

stanmodelcode = '
data {
int<lower=1> N; 
int nGROUPS;
real y[N];                                 
real x[N];   
int<lower=1, upper=nGROUPS> GROUP_ID[N];
real L[N];
real weights[N];
}


transformed data{

}

parameters {
real intercept;  
vector[nGROUPS] intercept_x2;
real beta;                   
real<lower=0> sigma;    
real<lower=0>  sigma_2;
}

transformed parameters {  // none needed
}

model {
real mu;

// priors
intercept~ normal(0,10);
intercept_x2 ~ normal(0,sigma_2);
beta ~ normal(0,10);
sigma ~ normal(0,10);
sigma_2 ~ normal(0,10);

// likelihood

for(i in 1:N){
mu = intercept + intercept_x2[ GROUP_ID[i] ] +  beta*x[i];
y[i] ~ normal(mu, sigma/weights[i]) T[L[i],] ;
}

}
'

fit22 = stan(model_code=stanmodelcode, data=standat, iter=10000, warmup=2000, chains = 1)
fit22


post = rstan::extract(fit22)
get_post_pred = function(post, x, GROUP_ID){
  
  pred = with(post,   intercept +  intercept_x2[,GROUP_ID]  +   beta*x )
  return(pred)
}


head(dat)
head(post)
posterior_prediction_in_columns = sapply(1:length(dat$x), function(i) get_post_pred(post, dat$x[i], dat$GROUP_ID[i])  )
head(posterior_prediction_in_columns)
mean_column_prediciton = apply( posterior_prediction_in_columns,2,mean )
dat$BayesPrediction = mean_column_prediciton
head(dat)

## show predictions on current data set
ggplot(data = dat, aes(x = x, y  = y, color= group))+geom_point(aes(size = weights))   + #+facet_wrap(~CPSU)
  geom_line(data = dat, aes(x = x, y = BayesPrediction, color = group))+geom_hline(yintercept = 0)+geom_vline(xintercept = 0)

It looks to me like you probably want some kind of continuous hinge, see here:

I think the truncation should help but for some reason it is fitting in the opposite direction

I couldn’t quite follow the model here—where is the truncation coming from generatively? You don’t want to do things like try to help the sampler by adding post-hoc constraints (like using the lower and upper bound of the data as bounds for a mean parameter)—it will distort the estimates.

If you really do have the truncation generatively, then I’d suggest starting simpler and building up to this. It looks like the error is very different for x > 0.

I have an outcome (y) and a predictor (x) and y has a lower bound of -x. So y is always >= -x. The data generating process such that x is set by a user and then the outcome for y can be >= -x. So before the y outcome is observed the lower bound for y is known. The y outcome will also depend on other predictors that I am not including here but I wanted to focus on this issue first. Wouldn’t this be a case to use T[L[i],] to push the liklihood of the data above the bound?

I can remove the T[L[i], ] but the predicted value will go below the bound in many cases and I was trying to improve the prediction by getting a better fit with T[L[i], ]. I also looked into a nonlinear multilevel model with brms() but doing something like

library(brms)
fit_brms = brm(y~ s(x) +(1|group), data = dat)
by_group = marginal_effects(fit_brms, conditions  = data.frame(group = dat$group) ,
                            re_formula = NULL, method = "predict")
plot(by_group, ncol = 5, points = TRUE)

where s(x) is a nonlinear fit on x using splines, produces the same spline shape for all groups. The spline fit is not specific to the groups. The y intercept of the groups will be different but shape of the fit will be the same. Do you know of any code or examples for a nonlinear multilevel model where the nonlinear fit is done at the level 2 (per group).

Thank you.

I think your problem is that the linear model is not appropriate for your data. truncation just makes this worse somehow. Your variance is much much larger for positive x, and so I think you wind up with positive slopes because it wants to put the prediction closer to those outlying values.

The reason I suggested a continuous hinge is that then you could have one slope on the left of x=0 and one slope on the right of x=0 and a continuous transition in between, thereby potentially modeling the behavior of your data better. Only once you have a reasonably model for your data will the truncation issue actually be something that helps.

Another alternative is that you have a regular old bug somewhere in your code, or your model fits very poorly and the output of Stan is meaningless. Have you checked your Rhat values?

Rhats are 1. Do you have any examples of stan code for a multilevel hinge regression with 3 or 4 predictors?

suppose you think of the hinge value in terms of the x location where it’s evaluted, slope 1, location of change x0, rate of change r, and slope 2

y ~ normal(hinge(x,s1,x0,r,s2),errs)

where maybe the scale of the errors also changes with x, so errs could be a vector or a function of x

then s1,x,r,s2 are all parameters that vary by group, so each is a vector of parameters with one entry per group.

See if that points you in the right direction. You can derive the expression for the hinge function based on the description in that blog post linked above.